Purpose of the Document
The aim of these supplementary details are to provide readers with better documentation of the complete modeling and model selection process. There were a total of 18 models fit as part of this study, and the number of analyses and supporting figures to understand this process are simply too long for a single document. This document therefore provides relevant plots and summaries to help readers understand the steps taken to arrive at the final model, and perhaps more importantly, the document provides commentary for how each modeling choice was made.
The goal of this document is therefore to ensure complete transparency in the data analysis process. Accompanying this document is also the R script file that was created during the study process. Someone who acquires access to the appropriate files through the HRS should be able to run the script file (with some minimal alterations to ensure that the files are read from the correct files) and then get results that match those shown here (within some margin of error due to randomness in some of the methods, though the use of the same seed in R and brms should reduce the impact of this).
Note that by default all the code used to generate the results in this document is not shown. As some readers may be interested in this code, it is possible to turn on its display by toggling the switch in the top right of this page. Every output from the document also has a toggle button (a small button with the word “code”) that, when pressed, will show the code used to produce the output. Additionally, the raw Markdown file (*.rmd) is available on the github with the R script.
Overview of Sections
In an attempt to facilitate readability of this document, the modeling process is broken down into a few different sections. Additionally, there is a table of contents that readers can use to jump to any section, so for those interested in specific aspects of this supplementary material, it is useful to know what exists in each section.
First, a flowchart showing the order in which models were built and the model to which each was tested is provided. Each candidate model was compared to the up-to-then best fitting model. The aim of this flowchart is to help familiarize readers with the models and modeling steps quickly and efficiently as the remaining sections add greater detail.
Second, the priors used in the models are explored. As the data analyses were done in the Bayesian framework, the analysis of the priors is useful. The priors are shown as well as the prior predictive checks for an intercept-only model.
Third, the details of each model are provided. This is the largest section as each model includes a variety of explorations. To help reduce the overall length of the document, each model is given its own tab so that readers can select one model at a time. Details for each model can be broken down into 3 primary sections: model validity, model performance, and model estimates.
- Model validity refers to tests of whether the estimation process converged and not subject to any issues that would make estimates from the model entirely invalid or unstable. These tests include visual inspection of chain mixtures, \(\hat{R}\), effective sample size, and maximum treedepth.
- Model performance refers to the posterior predictive checks that mirror those shown in the manuscript for the final model: predictions of all responses for all items, responses to each item, and responses to all items for a subset of participants. As the space of this document is less limited than space in the final manuscript, the random subset of participants is increased from 6 to 20.
- Model estimates refers to the summary of model parameters like fixed and random effects estimates. This summary is different than the one presented in the manuscript for the final paper as the objective of these intermediary models is not to summarize effect sizes or the probability of these effects; instead, the goal is to get a general idea of what the model is estimating and how it is performing. Toward this end, conditional effects plots for each model are also included. Note that these plots may not be very informative for the majority of the models tested because only a subset included many covariates.
Fourth, the results of the model comparisons are provided. These are the formal comparisons of the leave-one-out information criterion (LOOIC) and pseudo-Bayesian model averaging (pseudo-BMA) tests. Not all models were compared to one another, so it can be helpful to return to the flowchart to see the model comparisons that were made.
Finally, as discussed in the manuscript, some additional details regarding the final model are also included here. These details include summaries of the item parameters in traditional IRT metrics (i.e., as difficulty and discrimination), the reliability plot, expected score functions (total test and first trial), and test information functions (total test and first trial). The same caveat as in the manuscript is made here: the item parameters are estimated assuming the local dependency effect is 0 for every item on all three trials. This is not a realistic assumption, but these effects are unique for every response pattern, making summarizing the various effects overwhelming. The test is best understood as a dynamic one in which a person’s performance on each trial changes our expectation for how they will perform on the next; however, a gross simplification that captures the general expectation of how items will perform can be obtained by ignoring this dynamic element (as done here).
What’s Being Done
As mentioned earlier, this document shows all the code used to generate the results. Since there is an accompanying R data script, it may be useful for readers to know the objects being called in this Markdown document since those objects can be connected back to the R script. The hope is that this will create a reasonable sense of cohesion between the supplementary materials, and it should mean that all the results here are also fully reproducible. Toward that end, the objects and packages used in this document are shown below (note that the R objects are read in as RDS files whose names are consistent with those listed in the R script file).
#read in needed data
df_long <- readRDS("Data_long.rds")
Rasch_prior <- readRDS("Fitted Models/1PL_prior_check.rds")
TwoPL_prior <- readRDS("Fitted Models/2PL_prior_check.rds")
Rasch_inter <- readRDS("Fitted Models/1PL_intercept.rds")
TwoPL_inter <- readRDS("Fitted Models/2PL_intercept.rds")
TwoPL_ranItemTrialsInt <- readRDS("Fitted Models/2PL_random_items_allThree_int.rds")
TwoPL_ranItemTrialsNoInt <- readRDS("Fitted Models/2PL_random_items_allThree_noInt.rds")
TwoPL_ranItemTrial1Int <- readRDS("Fitted Models/2PL_random_items_trialOne_int.rds")
TwoPL_ranItemTrial1NoInt <- readRDS("Fitted Models/2PL_random_items_trialOne_noInt.rds")
TwoPL_depmd <- readRDS("Fitted Models/2PL_dependency_model.rds")
TwoPL_depea <- readRDS("Fitted Models/2PL_dependency_easinessOnly.rds")
TwoPL_depsi <- readRDS("Fitted Models/2PL_dependency_trialOne.rds")
TwoPL_learn <- readRDS("Fitted Models/2PL_learningModel.rds")
TwoPL_multi <- readRDS("Fitted Models/2PL_multidimensional.rds")
TwoPL_srlps <- readRDS("Fitted Models/2PL_serialPosition.rds")
TwoPL_t3spe <- readRDS("Fitted Models/2PL_serialPosition_trialUnique.rds")
TwoPL_itmex <- readRDS("Fitted Models/2PL_itemCovariates.rds")
TwoPL_itmcp <- readRDS("Fitted Models/2PL_itemCovariates_byTrial.rds")
TwoPL_itmcr <- readRDS("Fitted Models/2PL_itemCovariates_uniqueInteractions.rds")
TwoPL_itmsd <- readRDS("Fitted Models/2PL_itemCovariates_simplified.rds")
TwoPL_itmfn <- readRDS("Fitted Models/2PL_itemCovariates_reduced.rds")
#load in required packages
#wrapping in suppressPackageStartupMessages() done just to reduce print out in document
suppressPackageStartupMessages(library(brms))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(kableExtra))Modeling Flowchart
As can be inferred from the figure below, the modeling process involved many iterations of fitting similar models and considering results from old models. As there specific hypotheses regarding the model and its covariates, the models needed to test these hypotheses were naturally created as part of the study. At the same time, alternative models had to be specified against which to test these models. Additionally, while there were hypotheses regarding what covariates (and their signs) would ultimately be in the final model, the important result of the study is the final model itself. Some readers may see the number of models examined and variations in their specification and become concerned of potentially two things: multiple comparisons + inflated error rates and/or model fishing/p-hacking.
The objective of this supporting document is to help clarify all the modeling choices so that readers do not need to question whether model specifications were made to try to improve performance of the final result. With respect to possible concerns regarding multiple comparisons, Bayesian methods do no suffer from these concerns (Gelman et al., 2013; Gelman, Hill,& Yajima, 2012; Neath, Flores, & Cavanaugh, 2017; Sjölander & Vansteelandt, 2019). While there are several reasons that this is the case for Bayesian methods, it is sufficient to speak to three. First, we do not use null hypothesis testing in this study. Model comparisons are completed using a formal comparison of information criteria to select models with better out-of-sample performance. Coefficients are not interpreted as significant or not but instead are summarized in terms of their probability of existing. Since the correction to p-values for multiple comparisons is to control the risk of falsely rejecting the null hypothesis, this is not a concern when we are not rejecting null hypotheses. Second, we utilize skeptical priors for our effects estimates. This means that we are a priori placing greater probability of the effects being 0 (or practically equivalent to 0). This is the inverse of frequentist decision-making practices where the null hypothesis is very easy to reject since it is constrained to (usually) a nill point value, which is rarely a realistic threshold for any model specification. Finally, the models benefit from hierarchical partial pooling of effects, meaning that estimates are pulled away from extreme values and closer to the overall mean. Combined with the skeptical priors, this means that all effects estimates are pulled closer to those a priori small effects.
For anyone concerned of model fitting to simply improve performance, the remainder of this document aims to explain each modeling decision. While some of the model specifications are discussed in detail in the manuscript, specific justifications and rationales for adding or dropping covariates could not be explained there, so this document takes that next step. While every modeling decision was based on trying to find the best performing model, the study still utilized a priori hypotheses about each predictor, and each predictor was tested in order to test these hypotheses. Once these hypotheses are tested, however, it is still important to have a final model that summarizes these findings (i.e., that include only the most relevant predictors).
Flowchart of all models fit and compared in order (click to make larger)
Prior Specifications and Inspection
As discussed in the manuscript for this study, prior specification came from documentation on using brms for IRT (i.e., Bürkner, 2020a and Bürkner, 2020b). As a general note, the non-linear specification for the 2PL model used in this study come from the B$uuml;rkner (2020b) study published in the Journal of Intelligence. Also as discussed in the manuscript, the specification of priors follows the recommendations of other typical multilevel regression guides (e.g., Gelman & Hill, 2007). Specifically, the priors are normal distributions with wide variances relative to the scale of the outcome data. As these priors on the regression coefficients, normal distributions are appropriate prior distributions. While these distributions are centered on 0, they are made wide and thus only weakly informative to the final parameter estimates. This specification helps regularize estimation (i.e., pull estimates toward zero and away from more extreme values) while imparting little a priori influence on the estimates. Additionally, by making the priors skeptical (i.e., they place the greatest probability on very small to non-existent effects), there is a reduction in the risk of experimenter bias; however, with 1050 participants each observed 30 times, the data will dominate the prior anyway.
The priors for the Rasch and 2PL models are shown below:
Rasch_priors <-
prior("normal(0, 2)", class = "Intercept") +
prior("normal(0, 3)", class = "sd", group = "ID") +
prior("normal(0, 3)", class = "sd", group = "Item")
TwoPL_priors <-
prior("normal(0, 2)", class = "b", nlpar = "beta") +
prior("normal(0, 1)", class = "b", nlpar = "logalpha") +
prior("normal(0, 1)", class = "sd", group = "ID", nlpar = "theta") +
prior("normal(0, 3)", class = "sd", group = "Item", nlpar = "beta") +
prior("normal(0, 1)", class = "sd", group = "Item", nlpar = "logalpha")Readers following the R script file will recognize that the above are repeated in that document (lines 310-321). To read these priors, it can be helpful to look at a couple of examples. Starting with the priors for the Rasch model, the prior for the intercept is specified as a normal distribution with a mean of zero and standard deviation of 2. This would mean that, before looking at the data, we are guessing that there is a 68% probability of the intercept being between -2 and 2 (i.e., +/- 1SD), and we are guessing that there is about a 95% probability that the intercept will be between -4 and 4 (i.e., +/- 2SD). Since the model uses a logit transformation, these values correspond to odds ratios of \(e^{\beta_0}\) where \(\beta_0\) is the intercept. Clearly, this range is very large (e.g., ~0.02 - 54.60 at 2SD), which is what makes the prior specification weakly informative. Take another example but this time from the 2PL model and a random effect. The random effect priors are all labeled as class = "sd" since we are putting a prior belief on the plausible values of the standard deviation of the random effects. For the random person effect (i.e., latent trait of each participant), we look for the variable that defines this group (group = "ID", where ID is a number indexing each participant) and the non-linear element it is estimating (nlpar = "theta" where theta is the traditional IRT simple for the latent trait). The prior therefore for the latent trait is the standard normal distribution with mean of zero and standard deviation of 1. This specification is consistent with the treatment of the latent trait in IRT as normally distributed, though generally in IRT models the variance is constrained to be 1 for identifiability purposes (see Bürnker, 2020a for details).
While it’s comforting to be able to go through each prior specification and think about what it means, it is perhaps more efficient to examine some plots. The first set of plots shown are the parameter estimates returned when the model samples only from the prior. In other words, these are the estimated effects implied by the priors. If the prior specifications are truly skeptical and weakly informative, then they will give the greatest probability to effects of very small size and cover a wide range of plausible values. These effects estimates for an intercept only Rasch are shown below.
plot(Rasch_prior, combo = c("dens", "intervals"))Consistent with expectations, these plots show wide ranges of plausible values with the greatest probabilities being placed on small effects. In the left column are the density plots of the estimates while the right column shows the interval estimates with the circle = mean estimate, bold line = 50% credible interval, and thin line = 90% credible interval. The intercept density very clearly follows a normal distribution that allows admissible values with more extreme estimates being given small overall probability of being true. The standard deviations for the random effects are also appropriate for the data as they are not giving any impossible (i.e., negative) estimates and cover a very wide range of possible values. These wide priors allow the data to dominate the posterior estimates for the parameters, though again this would likely be the case with even more informative priors due to the size of the sample available.
plot(TwoPL_prior, N = 3, combo = c("dens", "intervals"), ask = FALSE)The plots above mirror those for the Rasch model but are for the 2PL model, which necessarily estimates more parameters. Again, the intercept priors make reasonable sense and the standard deviations are all giving only appropriate estimates. Note that there is also a prior over the correlation between item easiness (“beta”) and discrimination (“logalpha”). This prior is not specified in the earlier code as the default in brms is the LKJ prior with \(\eta\) = 1, which is the preferred weakly informative prior for a correlation matrix. As is apparent from the above figure, the LKJ prior is close to a uniform distribution that is bounded by -1 and 1; however, unlikely a true uniform distribution, the LKJ prior gives less probability to correlations close to perfect. Thus, it is fairly ideal as a prior for correlations as it is bound appropriate (i.e., between -1 and 1) and provides little a priori belief/information about the correlations except to say that correlations close to perfect are unlikely, which is almost always a sensible assumption for any data.
Another related graphical output is the prior predictive check. The prior predictive check runs the model using the priors rather than the observed data (note that this model would be similar to a null model). If the priors are specified well, then they should return reasonable, albeit widely varying, estimates of the observed data. The prior predictive checks for the Rasch and 2PL models are shown below following the same layout as the posterior predictive checks in the manuscript and for the other models.
pp_check(Rasch_prior, nsamples = 25, type = "bars")pp_check(Rasch_prior, nsamples = 25, type = "bars_grouped", group = "Item")pp_check(Rasch_prior, nsamples = 25, type = "bars_grouped", group = "ID",
newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 20, replace = FALSE))))The Rasch prior predictive checks above demonstrate that the prior specifications are adequately wide to provide coverage for the observed data. The estimates themselves are expectedly poor, but the important part is that the ranges of these estimates are sufficiently wide to allow the observed data to fall within the credible intervals. The same plots are now repeated but for the 2PL model.
pp_check(TwoPL_prior, nsamples = 25, type = "bars")pp_check(TwoPL_prior, nsamples = 25, type = "bars_grouped", group = "Item")pp_check(TwoPL_prior, nsamples = 25, type = "bars_grouped", group = "ID",
newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 20, replace = FALSE))))Performance of the 2PL priors are similar to those of the Rasch, suggesting that these priors are also appropriately specified.
While all the above plots and theoretical justifications suggest that the priors are specified consistent with the wishes for the model, it can also be helpful to perform a post-hoc test of whether a model’s priors were influential on its final estimates. As has been mentioned multiple times in this section, due to the sample size of this study, it is expected that the data and not the prior will dominate the posterior estimates, meaning that even with more informative priors the data would have more influence on the final estimates. One such comparison discussed by Andrew Gelman (link here) is to compare the posterior standard deviation (i.e., precision of the effect estimate after looking at the data) to the prior standard deviation (i.e., uncertainty of the effect estimate before looking at the data). In the case that a prior is influential, the ratio of the precision to uncertainty will be large. Put another way, we learn little more about the posterior from observing the data because the prior was already highly informative. Gelman’s recommended threshold for determining whether a prior is informative is if the posterior standard deviation for an effect is more than 0.1 times the prior standard deviation. The able below provides this metric for each predictor from the final model reported in the study. As a note, these are the only parameters examined for sensitivity to the prior as these are the only ones on which inference was conducted.
#get the posterior samples from the final model
posteriors <- posterior_samples(TwoPL_itmfn)
#get the fixed effects for the item easiness
beta <- posteriors %>%
select(starts_with("b_beta")) %>%
apply(., 2, function(x) sd(x)/sd(posteriors$prior_b_beta)) %>%
as.matrix()
#do the same for item discrimination
alpha <- posteriors %>%
select("b_logalpha_Intercept") %>%
apply(., 2, function(x) sd(x)/sd(posteriors$prior_b_logalpha)) %>%
as.matrix()
#combine into a single result
result <- rbind(beta, alpha) %>%
as.data.frame() %>%
add_column("Prior Influence" = ifelse(.[, 1] >= 0.1, "Informative", "Uninformative")) %>%
rename("Ratio (Posterior:Prior)" = V1) %>%
slice(1, 18, 2:11, 14:15, 17, 16, 12:13)
row.names(result) <- c("Easiness Intercept", "Discrimination Intercept", "Dependency: Butter", "Dependency: Arm", "Dependency: Shore", "Dependency: Letter", "Dependency: Queen", "Dependency: Cabin", "Dependency: Pole", "Dependency: Ticket", "Dependency: Grass", "Dependency: Engine", "HAL Frequency: Trial 1", "HAL Frequency: Trial 2 or 3", "Age of Acquisition: Trial 1 or 2", "Age of Acquisition: Trial 3", "Body-Object Integration", "Concreteness")
rm(posteriors, beta, alpha)
#get resulting table
result %>%
kable(caption = "Comparison of the Posterior to Prior Distribution Standard Deviations", digits = 4, align = 'cc') %>%
column_spec(1:3, bold = ifelse(result$`Ratio (Posterior:Prior)` >= 0.10, TRUE, FALSE)) %>%
kable_classic(full_width = FALSE, position = "float_right")| Ratio (Posterior:Prior) | Prior Influence | |
|---|---|---|
| Easiness Intercept | 0.0589 | Uninformative |
| Discrimination Intercept | 0.7092 | Informative |
| Dependency: Butter | 0.0344 | Uninformative |
| Dependency: Arm | 0.0411 | Uninformative |
| Dependency: Shore | 0.0351 | Uninformative |
| Dependency: Letter | 0.0299 | Uninformative |
| Dependency: Queen | 0.0333 | Uninformative |
| Dependency: Cabin | 0.0309 | Uninformative |
| Dependency: Pole | 0.0314 | Uninformative |
| Dependency: Ticket | 0.0293 | Uninformative |
| Dependency: Grass | 0.0347 | Uninformative |
| Dependency: Engine | 0.0300 | Uninformative |
| HAL Frequency: Trial 1 | 0.2350 | Informative |
| HAL Frequency: Trial 2 or 3 | 0.0840 | Uninformative |
| Age of Acquisition: Trial 1 or 2 | 0.0753 | Uninformative |
| Age of Acquisition: Trial 3 | 0.1457 | Informative |
| Body-Object Integration | 0.0595 | Uninformative |
| Concreteness | 0.0552 | Uninformative |
The table of these posterior and prior comparison results is shown here. For convenience, only those whose threshold exceeds the recommended > 0.10 ratio are bolded. Generally, the findings suggest that the priors performed as expected: they were weakly informative and did not seemingly have undue influence on the posterior estimates. Notably, the exceptions correspond to parameter estimates whose 95% credible intervals included zero (e.g., those that might be considered non-significant). This is not unexpected as the result essentially indicates that even after looking at the data we do not change our prior beliefs, which were that the effects were either zero or very small. The influence of the prior on these null findings reflects a point made earlier regarding how Bayesian methods are generally unaffected by multiple comparisons.
An important point to emphasize at this juncture is the implication of a “significant” finding in Bayesian methods. As discussed throughout this section on priors, the priors here are skeptical of an effect in the sense that they place greatest weight on an effect estimate of zero or close to zero and ambivalence regarding the direction of the effect (i.e., it is just as equally probable that the sign is positive of negative). In the context of the current study, this means that, despite the hypotheses regarding the presence and direction of specific effects, the priors for these predictors are specified in this skeptical way so as to avoid the introduction of experimenter bias. In regard to the robustness of the effects observed, the fact that they are observed from the information provided by the data despite these skeptical priors also helps build confidence in the presence of these effects.
Model Details
The details of this section highlight the models fitting and results. These details speak to the validity of the model results and then also the actual results (i.e., parameter estimates) of the model. Model validity is particularly important in Bayesian methods because the parameter estimates are based on Monte Carlo Markov Chains (or Hamiltonian Monte Carlo (HMC) in the case of these models run using Stan). In cases where a model fails to converge or throw errors under the estimator, the validity of the model results are questionable or even completely invalid (e.g., in the case of divergent transitions). To reflect this need to first confirm the validity of the results, various diagnostics of the model fit are provided first before then presenting the model results.
For readers unfamiliar with these model checks, a brief overview of each is provided here. The largest threat to model results in the HMC is arguably the presence of divergent transitions. HMC explores the posterior distribution by simulating the evolution of a Hamiltonian system, and in order to do this efficiently, the sampler finds a reasonable step size with which to explore that space. A divergent transition occurs when the trajectory of the system is lost due to too large of a step size. Another important model check is the treedepth of the chains. Again, to improve the efficiency of the posterior sampling, a maximum treedepth is set to prevent the estimator spending excessive time in certain steps and spaces. Since this treedepth may artificially limit the estimator in exploring the posterior space, it is important to check whether any of these treedepths were actually hit during estimation (default treedepth is 10). Another important Bayesian model indicator is \(\hat{R}\) because multiple HMC (and MCMC) chains are needed to ensure that the posterior is sampled appropriately. If a single chain is run, then it is not possible to determine whether the random starting values of this chain may have lead to a specific set of parameter estimates. Running multiple independent chains that each have different random starting values helps ensure that the parameter estimates are not biased by exploration of only certain posterior values. In well-behaved models, these chains will mix together without any clear indications of one chain producing a specific set of parameter estimates that differ from what other chains are estimating. While this mixture of chains can be visually inspected via the trace plot (also provided here), the \(\hat{R}\) statistic is a simple indicator of this with the conservative recommendation of treating estimates as valid only if the \(\hat{R}\) for the parameter is less than 1.01. A final model validity check shown here is the effective sample size. Because multiple chains are run for many samples of the posterior, it is expected that some of those samples are autocorrelated and thus dependent on previous samples. Effective sample size informs us of the precision of the model in MCMC and HMC methods. When samples are independent, the central limit theorem indicates that the precision with which a parameter can be estimated is proportional to the size of the sample (e.g., \(\sigma_\bar{x} = \frac{\sigma}{\sqrt{N}}\)). The same proportionality can be obtained when samples are dependent but requires replacing \(N\) with \(N_{ESS}\), or the effective sample size. Due to the dependence of the sampling, \(N_{ESS} < N\) and thus precision of the estimate is less than would be if it could be estimated from the total sample. Running the chains for more iterations will necessarily increase \(N_{ESS}\), but there is a practical demand on computational effort relative to marginal increases in precision. The recommendation of the Stan developers is to run enough iterations of the sampler to obtain an \(N_{ESS} >= 4*N_{chains}\). All models were run using 4 independent chains, so the minimally acceptable ESS is 400 (i.e, 4*100).
In the case that the model checks are appropriate, it is appropriate to examine the posterior distribution and begin inference based on the results. While there are more parameters estimated by the models, posterior summaries of the coefficients in each model are shown. The posterior is summarized as a density plot that reflects the probability distribution of the parameter based on integrating our prior knowledge and observed data. The density plot shows the 95% credible interval with the 80% credible interval shown as a shaded area. Unlike frequentist confidence intervals, these credible intervals can be interpreted as the probability of the parameter having a specific value. For example, if the 95% credible ranges from 0.50 to 1.00, then this means that there is a probability of 0.95 that the parameter has a value somewhere within this interval. This is in contrast to frequentist confidence intervals where the same interval would be interpreted as meaning that 95% of point estimates based on the same statistical test applied to an infinite number of random samples of the same population will be within this interval. Thus, where the credible interval directly summarizes the our beliefs about the parameter and our uncertainty about its true value, the confidence interval only reflects point estimates that we would expect to observe if the study and statistical methods were repeated an infinite number of times. Posterior predictive checks of the models are then also presented as they were for the final model in the corresponding manuscript. There is one additional model exploration plot provided in this section that has not been addressed before in this document: the conditional effects plot. As the predictors in these models are correlated and have their effects estimated on the logit scale, it can be challenging to look at the model estimates and understand the implication of these values in an intuitive manner. One way to address this is to visualize how the predicted outcome of the model changes as a function of each predictor while holding all other predictors constant (e.g., at their mean value). The resulting plot is the conditional effects plot. In the case of these models, this plot shows, for each predictor, what happens to the predicted probability of a correct response as the value of the predictor changes and all other model values are held constant. These plots are not realistic as it is not reasonable to assume that there exist words whose traits can vary only on one property at a time; however, they do provide a quick method of understanding the relative effect of each predictor by showing its linear trend as implied by the model. As a result, these plots should not be used for prediction or extrapolation in any regard; instead, if the goal is prediction of responses, then the entire model should be used, and extrapolation of these predictions to values not observed in this study should be avoided. These plots are simply to help contextualize the meaning of the effect estimate in the model.
Rasch Intercept
mcmc_plot(Rasch_inter, type = "nuts_divergence")mcmc_plot(Rasch_inter, type = "nuts_treedepth")mcmc_plot(Rasch_inter, type = "trace")## No divergences to plot.
mcmc_plot(Rasch_inter, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(Rasch_inter, type = "neff_hist", binwidth = 0.1)The intercept only Rasch model demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. As a result, we can look at the results from the model overall.
mcmc_plot(Rasch_inter, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(Rasch_inter, nsamples = 50, type = "bars")pp_check(Rasch_inter, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(Rasch_inter, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Even with no predictors and fewer parameters in the Rasch model, the model does very well in predicting responses. Again, as this is an intercept only model, there is only one parameter to summarize with the posterior. The density plot of the intercept estimate demonstrates that there is high probability that the intercept is positive with a mode around 0.65. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters.
summary(Rasch_inter)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ 1 + (1 | Item) + (1 | ID)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.53 0.02 0.50 0.57 1.00 3575 4956
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.51 0.15 0.31 0.88 1.00 1921 2978
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.66 0.17 0.31 0.99 1.00 1079 1562
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
To help clarify the meaning of some major elements of the above output, consider the following guide: 1. “Estimate” refers to the average posterior value for the parameter, 2. “Est. Error” is the standard deviation of the posterior distribution, 3. “l-95% CI” is the lower bound of the 95% credible interval, 4. “u-95% CI” is the upper bound of the 95% credible interval, 5. “Rhat” is the \(\hat{R}\) value for that parameter (rounded to two decimal places), 6. “Bulk_ESS” is the effective sample size based on rank normalized draws and estimates the sampling efficiency of the mean of the posterior, and 7. “Tail_ESS” is the minimum of the effect sample sizes in the 5% and 95% quantiles.
2PL Intercept
mcmc_plot(TwoPL_inter, type = "nuts_divergence")mcmc_plot(TwoPL_inter, type = "nuts_treedepth")mcmc_plot(TwoPL_inter, type = "trace")## No divergences to plot.
mcmc_plot(TwoPL_inter, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_inter, type = "neff_hist", binwidth = 0.1)The intercept only 2PL model demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. As a result, we can look at the results from the model overall.
mcmc_plot(TwoPL_inter, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_inter, nsamples = 50, type = "bars")pp_check(TwoPL_inter, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_inter, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Much like with the Rasch model, the 2PL intercept-only model does very well in predicting responses. Although this is an intercept only model, there are two intercepts (one for difficulty/beta and one for discrimination/alpha) being estimated since the 2PL model is a non-linear model. Note that the model’s name for the discrimination parameter is “logalpha.” This name reflects the fact that the alpha parameter was log-transformed to ensure that it was constrained in estimation to positive values. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_inter)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + (1 | i | Item)
## logalpha ~ 1 + (1 | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.70 0.46 0.12 1.84 1.00 10470 5582
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Intercept) 0.54 0.16 0.32 0.94
## sd(logalpha_Intercept) 0.93 0.26 0.55 1.56
## cor(beta_Intercept,logalpha_Intercept) -0.09 0.30 -0.63 0.50
## Rhat Bulk_ESS Tail_ESS
## sd(beta_Intercept) 1.00 2967 4770
## sd(logalpha_Intercept) 1.00 4654 5884
## cor(beta_Intercept,logalpha_Intercept) 1.00 5219 5203
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 0.67 0.18 0.31 1.02 1.00 1996 3701
## logalpha_Intercept -0.28 0.72 -1.54 1.25 1.00 10251 5841
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
All Trials Fixed & Random Intercepts
mcmc_plot(TwoPL_ranItemTrialsInt, type = "nuts_divergence")mcmc_plot(TwoPL_ranItemTrialsInt, type = "nuts_treedepth")mcmc_plot(TwoPL_ranItemTrialsInt, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrialsInt, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrialsInt, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrialsInt, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_ranItemTrialsInt, type = "neff_hist", binwidth = 0.1)The 2PL model with varying intercepts across trials and fixed parameter intercepts demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Just for the sake of clarification, readers may note that there are considerably more trace plots included in these analyses. This is because there are additional parameters being estimated for these models as there are correlations between each item’s difficulty and discrimination intercepts across each trial in addition to the standard deviation of the random effects for difficulty, discrimination, and intercept across trials. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_ranItemTrialsInt, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_ranItemTrialsInt, nsamples = 50, type = "bars")pp_check(TwoPL_ranItemTrialsInt, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_ranItemTrialsInt, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_ranItemTrialsInt)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + (1 + Time | i | Item)
## logalpha ~ 1 + (1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.75 0.48 0.12 1.95 1.00 6466 4397
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Intercept) 1.13 0.31 0.69 1.87
## sd(beta_Time1) 1.34 0.30 0.89 2.07
## sd(beta_Time2) 1.71 0.37 1.16 2.57
## sd(logalpha_Intercept) 1.16 0.27 0.73 1.80
## sd(logalpha_Time1) 0.55 0.20 0.24 1.01
## sd(logalpha_Time2) 0.86 0.23 0.49 1.39
## cor(beta_Intercept,beta_Time1) -0.43 0.26 -0.82 0.17
## cor(beta_Intercept,beta_Time2) -0.64 0.24 -0.94 -0.05
## cor(beta_Time1,beta_Time2) 0.64 0.18 0.21 0.90
## cor(beta_Intercept,logalpha_Intercept) -0.11 0.24 -0.55 0.37
## cor(beta_Time1,logalpha_Intercept) 0.00 0.26 -0.49 0.51
## cor(beta_Time2,logalpha_Intercept) 0.18 0.26 -0.35 0.65
## cor(beta_Intercept,logalpha_Time1) -0.10 0.25 -0.57 0.40
## cor(beta_Time1,logalpha_Time1) -0.03 0.27 -0.54 0.51
## cor(beta_Time2,logalpha_Time1) 0.02 0.24 -0.46 0.49
## cor(logalpha_Intercept,logalpha_Time1) -0.31 0.29 -0.78 0.31
## cor(beta_Intercept,logalpha_Time2) 0.03 0.24 -0.43 0.51
## cor(beta_Time1,logalpha_Time2) 0.25 0.22 -0.21 0.64
## cor(beta_Time2,logalpha_Time2) 0.06 0.22 -0.38 0.49
## cor(logalpha_Intercept,logalpha_Time2) -0.69 0.20 -0.94 -0.21
## cor(logalpha_Time1,logalpha_Time2) 0.47 0.30 -0.22 0.91
## Rhat Bulk_ESS Tail_ESS
## sd(beta_Intercept) 1.00 1757 3356
## sd(beta_Time1) 1.00 2099 3229
## sd(beta_Time2) 1.00 1934 3667
## sd(logalpha_Intercept) 1.00 3528 5035
## sd(logalpha_Time1) 1.00 3117 3989
## sd(logalpha_Time2) 1.00 3082 4640
## cor(beta_Intercept,beta_Time1) 1.00 1108 1918
## cor(beta_Intercept,beta_Time2) 1.00 902 1907
## cor(beta_Time1,beta_Time2) 1.00 3523 4499
## cor(beta_Intercept,logalpha_Intercept) 1.00 3502 4655
## cor(beta_Time1,logalpha_Intercept) 1.00 4985 5565
## cor(beta_Time2,logalpha_Intercept) 1.00 4444 4865
## cor(beta_Intercept,logalpha_Time1) 1.00 4793 5210
## cor(beta_Time1,logalpha_Time1) 1.00 6590 6374
## cor(beta_Time2,logalpha_Time1) 1.00 5659 5830
## cor(logalpha_Intercept,logalpha_Time1) 1.00 5719 5585
## cor(beta_Intercept,logalpha_Time2) 1.00 1765 3303
## cor(beta_Time1,logalpha_Time2) 1.00 5146 5519
## cor(beta_Time2,logalpha_Time2) 1.00 4523 5612
## cor(logalpha_Intercept,logalpha_Time2) 1.00 4507 4856
## cor(logalpha_Time1,logalpha_Time2) 1.00 4354 5531
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 0.56 0.39 -0.22 1.31 1.01 637 1237
## logalpha_Intercept -0.19 0.72 -1.47 1.33 1.00 5838 4552
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
All Trials Fixed Intercept
mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "nuts_divergence")mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "nuts_treedepth")mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "neff_hist", binwidth = 0.1)The 2PL model with fixed intercepts but no random intercepts demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_ranItemTrialsNoInt, nsamples = 50, type = "bars")pp_check(TwoPL_ranItemTrialsNoInt, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_ranItemTrialsNoInt, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_ranItemTrialsNoInt)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + (0 + Time | i | Item)
## logalpha ~ 1 + (0 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.79 0.48 0.15 1.99 1.00 9694 5520
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.67 0.47 0.97 2.77 1.00
## sd(beta_Time1) 0.91 0.25 0.56 1.51 1.00
## sd(beta_Time2) 0.68 0.23 0.38 1.27 1.00
## sd(logalpha_Time0) 1.49 0.37 0.90 2.35 1.00
## sd(logalpha_Time1) 0.74 0.20 0.43 1.23 1.00
## sd(logalpha_Time2) 0.45 0.13 0.26 0.78 1.00
## cor(beta_Time0,beta_Time1) 0.25 0.26 -0.30 0.70 1.00
## cor(beta_Time0,beta_Time2) -0.32 0.27 -0.75 0.28 1.00
## cor(beta_Time1,beta_Time2) 0.14 0.26 -0.38 0.64 1.00
## cor(beta_Time0,logalpha_Time0) 0.06 0.23 -0.40 0.50 1.00
## cor(beta_Time1,logalpha_Time0) -0.09 0.25 -0.54 0.40 1.00
## cor(beta_Time2,logalpha_Time0) 0.16 0.25 -0.34 0.63 1.00
## cor(beta_Time0,logalpha_Time1) -0.01 0.25 -0.50 0.47 1.00
## cor(beta_Time1,logalpha_Time1) -0.22 0.25 -0.65 0.30 1.00
## cor(beta_Time2,logalpha_Time1) 0.12 0.25 -0.38 0.58 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.60 0.20 0.12 0.90 1.00
## cor(beta_Time0,logalpha_Time2) -0.22 0.29 -0.72 0.39 1.00
## cor(beta_Time1,logalpha_Time2) 0.02 0.27 -0.50 0.53 1.00
## cor(beta_Time2,logalpha_Time2) 0.27 0.27 -0.32 0.73 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.30 0.26 -0.25 0.74 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.60 0.21 0.09 0.91 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 1513 2338
## sd(beta_Time1) 3211 4645
## sd(beta_Time2) 2054 2875
## sd(logalpha_Time0) 5495 5726
## sd(logalpha_Time1) 4788 5733
## sd(logalpha_Time2) 4979 6049
## cor(beta_Time0,beta_Time1) 3076 4132
## cor(beta_Time0,beta_Time2) 4218 5494
## cor(beta_Time1,beta_Time2) 3926 4235
## cor(beta_Time0,logalpha_Time0) 4681 5437
## cor(beta_Time1,logalpha_Time0) 5151 5393
## cor(beta_Time2,logalpha_Time0) 3620 5669
## cor(beta_Time0,logalpha_Time1) 5145 5398
## cor(beta_Time1,logalpha_Time1) 7289 6149
## cor(beta_Time2,logalpha_Time1) 5186 5981
## cor(logalpha_Time0,logalpha_Time1) 6550 6713
## cor(beta_Time0,logalpha_Time2) 7824 5706
## cor(beta_Time1,logalpha_Time2) 7176 6176
## cor(beta_Time2,logalpha_Time2) 6740 6109
## cor(logalpha_Time0,logalpha_Time2) 6133 6228
## cor(logalpha_Time1,logalpha_Time2) 5500 5910
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 1.04 0.22 0.56 1.46 1.00 1345 1696
## logalpha_Intercept -0.13 0.67 -1.28 1.35 1.00 9336 5636
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Trial 1 Fixed & Random Intercepts
mcmc_plot(TwoPL_ranItemTrial1Int, type = "nuts_divergence")mcmc_plot(TwoPL_ranItemTrial1Int, type = "nuts_treedepth")mcmc_plot(TwoPL_ranItemTrial1Int, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrial1Int, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrial1Int, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrial1Int, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_ranItemTrial1Int, type = "neff_hist", binwidth = 0.1)When initially run, this model did produce two divergent transitions after warmup. To address this estimation issue, the adaptive delta value for the step size was increased from 0.80 to 0.95. With this higher adaptive delta, the 2PL model with varying intercepts for trial 1 versus trials 2 and 3 demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_ranItemTrial1Int, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_ranItemTrial1Int, nsamples = 50, type = "bars")pp_check(TwoPL_ranItemTrial1Int, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_ranItemTrial1Int, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_ranItemTrial1Int)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + (1 + Trial1 | i | Item)
## logalpha ~ 1 + (1 + Trial1 | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.79 0.48 0.15 1.95 1.00 3995 4347
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Intercept) 0.60 0.19 0.34 1.09
## sd(beta_Trial11) 1.71 0.43 1.09 2.74
## sd(logalpha_Intercept) 0.68 0.20 0.40 1.16
## sd(logalpha_Trial11) 1.16 0.37 0.60 2.02
## cor(beta_Intercept,beta_Trial11) -0.13 0.37 -0.75 0.61
## cor(beta_Intercept,logalpha_Intercept) 0.05 0.30 -0.52 0.60
## cor(beta_Trial11,logalpha_Intercept) -0.14 0.35 -0.74 0.55
## cor(beta_Intercept,logalpha_Trial11) 0.05 0.30 -0.51 0.60
## cor(beta_Trial11,logalpha_Trial11) 0.07 0.26 -0.44 0.56
## cor(logalpha_Intercept,logalpha_Trial11) 0.37 0.27 -0.22 0.81
## Rhat Bulk_ESS Tail_ESS
## sd(beta_Intercept) 1.00 1587 2609
## sd(beta_Trial11) 1.00 1748 3126
## sd(logalpha_Intercept) 1.00 2581 3873
## sd(logalpha_Trial11) 1.00 3023 4353
## cor(beta_Intercept,beta_Trial11) 1.00 475 968
## cor(beta_Intercept,logalpha_Intercept) 1.00 2280 3855
## cor(beta_Trial11,logalpha_Intercept) 1.00 2304 3550
## cor(beta_Intercept,logalpha_Trial11) 1.00 1491 3270
## cor(beta_Trial11,logalpha_Trial11) 1.00 2870 4103
## cor(logalpha_Intercept,logalpha_Trial11) 1.00 2106 3948
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 1.05 0.26 0.50 1.56 1.01 386 1156
## logalpha_Intercept -0.15 0.68 -1.32 1.31 1.00 3490 4409
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Trial 1 Fixed Intercept
mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "nuts_divergence")mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "nuts_treedepth")mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "neff_hist", binwidth = 0.1)When initially run, this model also produced a divergent transition after warmup. To address this estimation issue, the adaptive delta value for the step size was increased from 0.80 to 0.95. With this higher adaptive delta, the 2PL model with fixed intercepts for trial 1 versus trials 2 and 3 demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_ranItemTrial1NoInt, nsamples = 50, type = "bars")pp_check(TwoPL_ranItemTrial1NoInt, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_ranItemTrial1NoInt, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_ranItemTrial1NoInt)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + (0 + Trial1 | i | Item)
## logalpha ~ 1 + (0 + Trial1 | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.76 0.50 0.13 2.00 1.00 4884 4113
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Trial10) 0.65 0.26 0.35 1.31
## sd(beta_Trial11) 1.58 0.48 0.87 2.76
## sd(logalpha_Trial10) 0.63 0.17 0.37 1.04
## sd(logalpha_Trial11) 1.49 0.38 0.91 2.35
## cor(beta_Trial10,beta_Trial11) -0.01 0.34 -0.61 0.66
## cor(beta_Trial10,logalpha_Trial10) 0.09 0.30 -0.48 0.64
## cor(beta_Trial11,logalpha_Trial10) -0.20 0.32 -0.74 0.45
## cor(beta_Trial10,logalpha_Trial11) 0.02 0.28 -0.51 0.57
## cor(beta_Trial11,logalpha_Trial11) -0.02 0.27 -0.52 0.49
## cor(logalpha_Trial10,logalpha_Trial11) 0.55 0.23 0.03 0.89
## Rhat Bulk_ESS Tail_ESS
## sd(beta_Trial10) 1.00 1103 2059
## sd(beta_Trial11) 1.00 1171 2675
## sd(logalpha_Trial10) 1.00 2552 3976
## sd(logalpha_Trial11) 1.00 3665 5306
## cor(beta_Trial10,beta_Trial11) 1.00 829 1597
## cor(beta_Trial10,logalpha_Trial10) 1.00 2964 3855
## cor(beta_Trial11,logalpha_Trial10) 1.00 3071 4032
## cor(beta_Trial10,logalpha_Trial11) 1.00 1889 3693
## cor(beta_Trial11,logalpha_Trial11) 1.00 3098 4412
## cor(logalpha_Trial10,logalpha_Trial11) 1.00 2027 3576
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 0.91 0.30 0.25 1.43 1.01 579 1346
## logalpha_Intercept -0.16 0.71 -1.41 1.35 1.00 4833 4253
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Local Dependency Model
mcmc_plot(TwoPL_depmd, type = "nuts_divergence")mcmc_plot(TwoPL_depmd, type = "nuts_treedepth")mcmc_plot(TwoPL_depmd, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_depmd, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_depmd, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_depmd, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_depmd, type = "neff_hist", binwidth = 0.1)The 2PL model with a local dependency effect demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_depmd, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_depmd, nsamples = 50, type = "bars")pp_check(TwoPL_depmd, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_depmd, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_depmd)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + (-1 + Time | i | Item)
## logalpha ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.67 0.45 0.11 1.82 1.00 9764 5282
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.27 0.37 0.74 2.16 1.00
## sd(beta_Time1) 0.91 0.25 0.55 1.53 1.00
## sd(beta_Time2) 0.82 0.24 0.48 1.43 1.00
## sd(logalpha_Time0) 1.63 0.39 1.01 2.53 1.00
## sd(logalpha_Time1) 0.81 0.32 0.31 1.56 1.00
## sd(logalpha_Time2) 0.50 0.22 0.11 0.98 1.00
## cor(beta_Time0,beta_Time1) 0.31 0.25 -0.25 0.74 1.00
## cor(beta_Time0,beta_Time2) -0.21 0.26 -0.67 0.33 1.00
## cor(beta_Time1,beta_Time2) 0.23 0.25 -0.30 0.68 1.00
## cor(beta_Time0,logalpha_Time0) -0.09 0.24 -0.53 0.38 1.00
## cor(beta_Time1,logalpha_Time0) -0.03 0.24 -0.48 0.45 1.00
## cor(beta_Time2,logalpha_Time0) 0.29 0.24 -0.21 0.70 1.00
## cor(beta_Time0,logalpha_Time1) -0.03 0.27 -0.55 0.47 1.00
## cor(beta_Time1,logalpha_Time1) -0.01 0.27 -0.54 0.51 1.00
## cor(beta_Time2,logalpha_Time1) 0.23 0.28 -0.34 0.71 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.56 0.23 0.03 0.90 1.00
## cor(beta_Time0,logalpha_Time2) -0.08 0.30 -0.64 0.51 1.00
## cor(beta_Time1,logalpha_Time2) 0.13 0.31 -0.49 0.68 1.00
## cor(beta_Time2,logalpha_Time2) 0.29 0.30 -0.35 0.80 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.17 0.30 -0.44 0.73 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.47 0.28 -0.20 0.88 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 1720 2602
## sd(beta_Time1) 3308 5596
## sd(beta_Time2) 2469 4312
## sd(logalpha_Time0) 6088 6761
## sd(logalpha_Time1) 2696 2733
## sd(logalpha_Time2) 2008 1383
## cor(beta_Time0,beta_Time1) 3842 5199
## cor(beta_Time0,beta_Time2) 4779 5177
## cor(beta_Time1,beta_Time2) 4385 4932
## cor(beta_Time0,logalpha_Time0) 5474 5463
## cor(beta_Time1,logalpha_Time0) 6602 5979
## cor(beta_Time2,logalpha_Time0) 4232 5412
## cor(beta_Time0,logalpha_Time1) 6394 5946
## cor(beta_Time1,logalpha_Time1) 7633 6133
## cor(beta_Time2,logalpha_Time1) 5289 5555
## cor(logalpha_Time0,logalpha_Time1) 6604 6444
## cor(beta_Time0,logalpha_Time2) 9588 6431
## cor(beta_Time1,logalpha_Time2) 8505 6599
## cor(beta_Time2,logalpha_Time2) 6561 6794
## cor(logalpha_Time0,logalpha_Time2) 5705 5733
## cor(logalpha_Time1,logalpha_Time2) 5307 4443
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 0.52 0.22 0.07 0.95 1.00 1184 1996
## beta_DepButter 0.10 0.09 -0.08 0.26 1.00 5312 5759
## beta_DepArm 0.02 0.09 -0.16 0.18 1.00 3956 5770
## beta_DepShore 0.44 0.08 0.28 0.59 1.00 5895 5969
## beta_DepLetter 0.42 0.06 0.31 0.54 1.00 10343 6025
## beta_DepQueen 0.27 0.07 0.13 0.41 1.00 8123 6214
## beta_DepCabin 0.20 0.07 0.07 0.33 1.00 7043 5941
## beta_DepPole 0.25 0.07 0.12 0.38 1.00 11485 5876
## beta_DepTicket 0.22 0.07 0.09 0.35 1.00 10411 6190
## beta_DepGrass 0.50 0.08 0.35 0.65 1.00 11210 6082
## beta_DepEngine 0.23 0.06 0.12 0.35 1.00 16286 6281
## logalpha_Intercept -0.31 0.74 -1.57 1.28 1.00 9347 5466
## logalpha_DepButter 0.07 0.13 -0.18 0.36 1.00 5546 4942
## logalpha_DepArm 0.30 0.10 0.10 0.49 1.00 3376 5166
## logalpha_DepShore 0.06 0.12 -0.18 0.31 1.00 8927 6267
## logalpha_DepLetter -1.03 0.62 -2.43 0.01 1.00 6629 5933
## logalpha_DepQueen 0.08 0.11 -0.12 0.29 1.00 4103 4470
## logalpha_DepCabin 0.15 0.13 -0.10 0.41 1.00 4169 5928
## logalpha_DepPole -0.48 0.59 -2.03 0.29 1.00 3211 4244
## logalpha_DepTicket 0.24 0.10 0.03 0.44 1.00 6959 6465
## logalpha_DepGrass 0.14 0.21 -0.27 0.59 1.00 3858 5076
## logalpha_DepEngine -0.03 0.20 -0.43 0.38 1.00 5403 6075
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Local Dependency (Unique by Item Easiness)
mcmc_plot(TwoPL_depea, type = "nuts_divergence")mcmc_plot(TwoPL_depea, type = "nuts_treedepth")mcmc_plot(TwoPL_depea, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_depea, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_depea, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_depea, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_depea, type = "neff_hist", binwidth = 0.1)The 2PL model with a local dependency effect only on the easiness parameter demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_depea, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_depea, nsamples = 50, type = "bars")pp_check(TwoPL_depea, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_depea, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_depea)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.72 0.46 0.12 1.85 1.00 7820 4729
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.32 0.39 0.76 2.29 1.00
## sd(beta_Time1) 0.98 0.27 0.60 1.65 1.00
## sd(beta_Time2) 0.93 0.28 0.54 1.63 1.00
## sd(logalpha_Time0) 1.63 0.40 1.00 2.55 1.00
## sd(logalpha_Time1) 1.03 0.31 0.59 1.76 1.00
## sd(logalpha_Time2) 0.62 0.19 0.35 1.08 1.00
## cor(beta_Time0,beta_Time1) 0.28 0.26 -0.27 0.71 1.00
## cor(beta_Time0,beta_Time2) -0.23 0.26 -0.68 0.33 1.00
## cor(beta_Time1,beta_Time2) 0.28 0.25 -0.26 0.72 1.00
## cor(beta_Time0,logalpha_Time0) -0.08 0.24 -0.53 0.38 1.00
## cor(beta_Time1,logalpha_Time0) -0.00 0.24 -0.47 0.48 1.00
## cor(beta_Time2,logalpha_Time0) 0.28 0.24 -0.22 0.69 1.00
## cor(beta_Time0,logalpha_Time1) 0.02 0.25 -0.47 0.48 1.00
## cor(beta_Time1,logalpha_Time1) 0.02 0.25 -0.48 0.50 1.00
## cor(beta_Time2,logalpha_Time1) 0.18 0.26 -0.33 0.65 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.50 0.21 0.02 0.83 1.00
## cor(beta_Time0,logalpha_Time2) -0.07 0.27 -0.59 0.46 1.00
## cor(beta_Time1,logalpha_Time2) 0.26 0.27 -0.30 0.72 1.00
## cor(beta_Time2,logalpha_Time2) 0.30 0.27 -0.27 0.75 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.15 0.25 -0.35 0.62 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.58 0.22 0.07 0.90 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 1154 2286
## sd(beta_Time1) 2722 3842
## sd(beta_Time2) 1966 3037
## sd(logalpha_Time0) 5373 5484
## sd(logalpha_Time1) 5284 5980
## sd(logalpha_Time2) 4564 4942
## cor(beta_Time0,beta_Time1) 3062 4242
## cor(beta_Time0,beta_Time2) 3886 4852
## cor(beta_Time1,beta_Time2) 4003 4825
## cor(beta_Time0,logalpha_Time0) 4136 5527
## cor(beta_Time1,logalpha_Time0) 4418 5362
## cor(beta_Time2,logalpha_Time0) 3426 5295
## cor(beta_Time0,logalpha_Time1) 5002 5969
## cor(beta_Time1,logalpha_Time1) 5503 5390
## cor(beta_Time2,logalpha_Time1) 4262 5194
## cor(logalpha_Time0,logalpha_Time1) 5781 6278
## cor(beta_Time0,logalpha_Time2) 6308 5832
## cor(beta_Time1,logalpha_Time2) 5667 5815
## cor(beta_Time2,logalpha_Time2) 5887 6121
## cor(logalpha_Time0,logalpha_Time2) 5633 5822
## cor(logalpha_Time1,logalpha_Time2) 5801 6971
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 0.54 0.24 0.07 0.99 1.00 975 1613
## beta_DepButter 0.09 0.08 -0.07 0.24 1.00 6116 5519
## beta_DepArm -0.12 0.09 -0.30 0.06 1.00 3814 5255
## beta_DepShore 0.45 0.08 0.29 0.60 1.00 5940 5951
## beta_DepLetter 0.42 0.06 0.31 0.53 1.00 11326 6457
## beta_DepQueen 0.23 0.07 0.09 0.36 1.00 9820 5904
## beta_DepCabin 0.18 0.07 0.05 0.31 1.00 6794 6087
## beta_DepPole 0.24 0.07 0.11 0.37 1.00 11461 5960
## beta_DepTicket 0.16 0.06 0.03 0.29 1.00 10811 6127
## beta_DepGrass 0.48 0.07 0.34 0.63 1.00 10776 6514
## beta_DepEngine 0.23 0.06 0.12 0.34 1.00 13633 6502
## logalpha_Intercept -0.28 0.70 -1.46 1.25 1.00 7308 5273
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Learning (Growth) Model
mcmc_plot(TwoPL_learn, type = "nuts_divergence")mcmc_plot(TwoPL_learn, type = "nuts_treedepth")mcmc_plot(TwoPL_learn, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_learn, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_learn, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_learn, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_learn, type = "neff_hist", binwidth = 0.1)The 2PL model with a multidimensional growth effect demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_learn, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_learn, nsamples = 50, type = "bars")pp_check(TwoPL_learn, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_learn, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_learn)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (-1 + Time1 + Time2 + Time3 | ID)
## beta ~ 1 + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(theta_Time1) 1.33 0.59 0.40 2.63 1.00 4661
## sd(theta_Time2) 0.15 0.20 0.00 0.69 1.00 1511
## sd(theta_Time3) 0.11 0.14 0.00 0.48 1.00 3044
## cor(theta_Time1,theta_Time2) 0.18 0.56 -0.88 0.97 1.00 4692
## cor(theta_Time1,theta_Time3) 0.17 0.56 -0.88 0.97 1.00 6254
## cor(theta_Time2,theta_Time3) 0.01 0.51 -0.88 0.88 1.00 5187
## Tail_ESS
## sd(theta_Time1) 3307
## sd(theta_Time2) 3135
## sd(theta_Time3) 3826
## cor(theta_Time1,theta_Time2) 3672
## cor(theta_Time1,theta_Time3) 4528
## cor(theta_Time2,theta_Time3) 5283
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.68 0.49 0.94 2.88 1.00
## sd(beta_Time1) 0.92 0.26 0.56 1.56 1.00
## sd(beta_Time2) 0.69 0.24 0.37 1.28 1.00
## sd(logalpha_Time0) 1.49 0.38 0.90 2.36 1.00
## sd(logalpha_Time1) 0.75 0.21 0.44 1.25 1.00
## sd(logalpha_Time2) 0.44 0.13 0.25 0.75 1.00
## cor(beta_Time0,beta_Time1) 0.26 0.27 -0.30 0.73 1.00
## cor(beta_Time0,beta_Time2) -0.31 0.27 -0.76 0.28 1.00
## cor(beta_Time1,beta_Time2) 0.15 0.27 -0.40 0.65 1.00
## cor(beta_Time0,logalpha_Time0) 0.04 0.24 -0.43 0.49 1.00
## cor(beta_Time1,logalpha_Time0) -0.09 0.26 -0.57 0.42 1.00
## cor(beta_Time2,logalpha_Time0) 0.17 0.25 -0.33 0.62 1.00
## cor(beta_Time0,logalpha_Time1) -0.02 0.26 -0.52 0.46 1.00
## cor(beta_Time1,logalpha_Time1) -0.23 0.25 -0.66 0.30 1.00
## cor(beta_Time2,logalpha_Time1) 0.12 0.25 -0.38 0.58 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.60 0.21 0.11 0.90 1.00
## cor(beta_Time0,logalpha_Time2) -0.19 0.30 -0.70 0.42 1.00
## cor(beta_Time1,logalpha_Time2) 0.02 0.27 -0.51 0.54 1.00
## cor(beta_Time2,logalpha_Time2) 0.26 0.27 -0.30 0.73 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.32 0.25 -0.21 0.76 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.60 0.21 0.10 0.91 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 826 1390
## sd(beta_Time1) 1632 3260
## sd(beta_Time2) 1052 1497
## sd(logalpha_Time0) 3385 5098
## sd(logalpha_Time1) 3405 4869
## sd(logalpha_Time2) 3761 4819
## cor(beta_Time0,beta_Time1) 1564 2786
## cor(beta_Time0,beta_Time2) 2533 3685
## cor(beta_Time1,beta_Time2) 2399 3084
## cor(beta_Time0,logalpha_Time0) 2536 4172
## cor(beta_Time1,logalpha_Time0) 3060 4360
## cor(beta_Time2,logalpha_Time0) 1952 3506
## cor(beta_Time0,logalpha_Time1) 3400 4410
## cor(beta_Time1,logalpha_Time1) 4056 4815
## cor(beta_Time2,logalpha_Time1) 3062 4886
## cor(logalpha_Time0,logalpha_Time1) 4174 5048
## cor(beta_Time0,logalpha_Time2) 4863 5073
## cor(beta_Time1,logalpha_Time2) 4914 5092
## cor(beta_Time2,logalpha_Time2) 4830 5639
## cor(logalpha_Time0,logalpha_Time2) 3794 5495
## cor(logalpha_Time1,logalpha_Time2) 4577 6036
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 1.03 0.24 0.47 1.44 1.00 658 658
## logalpha_Intercept -0.80 0.50 -1.68 0.30 1.00 4238 3760
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Multidimensional Model
mcmc_plot(TwoPL_multi, type = "nuts_divergence")mcmc_plot(TwoPL_multi, type = "nuts_treedepth")mcmc_plot(TwoPL_multi, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_multi, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_multi, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_multi, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_multi, type = "neff_hist", binwidth = 0.1)The 2PL model with multidimensional factors by trial demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. When initially run, the tail effective sample sizes were low. The model was refit with slightly more iterations (4000 - 2000 warmup/chain), which resolved the issue. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_multi, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_multi, nsamples = 50, type = "bars")pp_check(TwoPL_multi, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_multi, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_multi)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (-1 + Time | ID)
## beta ~ 1 + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(theta_Time0) 0.58 0.34 0.13 1.42 1.00 2893
## sd(theta_Time1) 0.83 0.37 0.26 1.68 1.00 2935
## sd(theta_Time2) 1.04 0.45 0.34 2.05 1.00 2791
## cor(theta_Time0,theta_Time1) 0.99 0.01 0.97 1.00 1.00 1051
## cor(theta_Time0,theta_Time2) 0.99 0.01 0.96 1.00 1.00 988
## cor(theta_Time1,theta_Time2) 0.99 0.01 0.98 1.00 1.00 3829
## Tail_ESS
## sd(theta_Time0) 3162
## sd(theta_Time1) 3019
## sd(theta_Time2) 2766
## cor(theta_Time0,theta_Time1) 2217
## cor(theta_Time0,theta_Time2) 1999
## cor(theta_Time1,theta_Time2) 5616
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.68 0.49 0.99 2.85 1.01
## sd(beta_Time1) 0.90 0.25 0.56 1.52 1.00
## sd(beta_Time2) 0.66 0.21 0.37 1.20 1.00
## sd(logalpha_Time0) 1.53 0.38 0.93 2.41 1.00
## sd(logalpha_Time1) 0.79 0.23 0.45 1.34 1.00
## sd(logalpha_Time2) 0.44 0.13 0.25 0.76 1.00
## cor(beta_Time0,beta_Time1) 0.25 0.26 -0.30 0.72 1.00
## cor(beta_Time0,beta_Time2) -0.30 0.28 -0.75 0.30 1.00
## cor(beta_Time1,beta_Time2) 0.15 0.26 -0.38 0.63 1.00
## cor(beta_Time0,logalpha_Time0) -0.10 0.27 -0.60 0.42 1.00
## cor(beta_Time1,logalpha_Time0) -0.10 0.25 -0.55 0.40 1.00
## cor(beta_Time2,logalpha_Time0) 0.27 0.24 -0.24 0.69 1.00
## cor(beta_Time0,logalpha_Time1) -0.09 0.28 -0.62 0.46 1.00
## cor(beta_Time1,logalpha_Time1) -0.23 0.25 -0.68 0.29 1.00
## cor(beta_Time2,logalpha_Time1) 0.16 0.26 -0.36 0.62 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.56 0.22 0.06 0.88 1.00
## cor(beta_Time0,logalpha_Time2) -0.13 0.30 -0.68 0.47 1.00
## cor(beta_Time1,logalpha_Time2) 0.03 0.26 -0.48 0.53 1.00
## cor(beta_Time2,logalpha_Time2) 0.23 0.27 -0.35 0.70 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.36 0.25 -0.18 0.77 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.60 0.22 0.07 0.91 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 858 1354
## sd(beta_Time1) 2045 3346
## sd(beta_Time2) 1430 2367
## sd(logalpha_Time0) 3380 4809
## sd(logalpha_Time1) 3470 5125
## sd(logalpha_Time2) 2950 4020
## cor(beta_Time0,beta_Time1) 1731 2936
## cor(beta_Time0,beta_Time2) 1804 3267
## cor(beta_Time1,beta_Time2) 2217 3692
## cor(beta_Time0,logalpha_Time0) 4381 4801
## cor(beta_Time1,logalpha_Time0) 4087 5195
## cor(beta_Time2,logalpha_Time0) 3086 4310
## cor(beta_Time0,logalpha_Time1) 4375 5041
## cor(beta_Time1,logalpha_Time1) 4217 4771
## cor(beta_Time2,logalpha_Time1) 3811 4490
## cor(logalpha_Time0,logalpha_Time1) 4105 5290
## cor(beta_Time0,logalpha_Time2) 5024 5241
## cor(beta_Time1,logalpha_Time2) 4380 5024
## cor(beta_Time2,logalpha_Time2) 4379 4639
## cor(logalpha_Time0,logalpha_Time2) 4121 4698
## cor(logalpha_Time1,logalpha_Time2) 4179 5542
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 1.05 0.21 0.60 1.43 1.01 782 1247
## logalpha_Intercept -0.47 0.48 -1.30 0.57 1.00 2691 2678
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Serial Position Effect
mcmc_plot(TwoPL_srlps, type = "nuts_divergence")mcmc_plot(TwoPL_srlps, type = "nuts_treedepth")mcmc_plot(TwoPL_srlps, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_srlps, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_srlps, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_srlps, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_srlps, type = "neff_hist", binwidth = 0.1)The 2PL model with a serial position effect demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_srlps, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_srlps, nsamples = 50, type = "bars")pp_check(TwoPL_srlps, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_srlps, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_srlps)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + poly(ItemPos - 1, 2) + (-1 + Time | i | Item)
## logalpha ~ 1 + poly(ItemPos - 1, 2) + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.72 0.47 0.12 1.88 1.00 7503 5417
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.28 0.38 0.75 2.22 1.00
## sd(beta_Time1) 0.97 0.27 0.59 1.63 1.00
## sd(beta_Time2) 0.91 0.27 0.53 1.57 1.00
## sd(logalpha_Time0) 1.63 0.40 0.99 2.53 1.00
## sd(logalpha_Time1) 1.03 0.30 0.58 1.77 1.00
## sd(logalpha_Time2) 0.61 0.18 0.35 1.05 1.00
## cor(beta_Time0,beta_Time1) 0.28 0.25 -0.25 0.72 1.00
## cor(beta_Time0,beta_Time2) -0.23 0.25 -0.67 0.31 1.00
## cor(beta_Time1,beta_Time2) 0.27 0.26 -0.28 0.70 1.00
## cor(beta_Time0,logalpha_Time0) -0.09 0.24 -0.54 0.39 1.00
## cor(beta_Time1,logalpha_Time0) -0.01 0.24 -0.47 0.46 1.00
## cor(beta_Time2,logalpha_Time0) 0.28 0.23 -0.20 0.69 1.00
## cor(beta_Time0,logalpha_Time1) 0.01 0.25 -0.47 0.49 1.00
## cor(beta_Time1,logalpha_Time1) 0.01 0.25 -0.49 0.49 1.00
## cor(beta_Time2,logalpha_Time1) 0.19 0.25 -0.34 0.64 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.49 0.22 -0.00 0.84 1.00
## cor(beta_Time0,logalpha_Time2) -0.07 0.27 -0.58 0.46 1.00
## cor(beta_Time1,logalpha_Time2) 0.26 0.26 -0.29 0.71 1.00
## cor(beta_Time2,logalpha_Time2) 0.30 0.27 -0.26 0.76 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.15 0.25 -0.36 0.61 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.57 0.22 0.06 0.90 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 1612 2281
## sd(beta_Time1) 2800 4438
## sd(beta_Time2) 1809 2963
## sd(logalpha_Time0) 5548 6286
## sd(logalpha_Time1) 5338 6238
## sd(logalpha_Time2) 4450 5336
## cor(beta_Time0,beta_Time1) 3003 3967
## cor(beta_Time0,beta_Time2) 4126 4810
## cor(beta_Time1,beta_Time2) 3501 4409
## cor(beta_Time0,logalpha_Time0) 4433 5242
## cor(beta_Time1,logalpha_Time0) 4962 5816
## cor(beta_Time2,logalpha_Time0) 3977 5440
## cor(beta_Time0,logalpha_Time1) 5491 5649
## cor(beta_Time1,logalpha_Time1) 5947 6541
## cor(beta_Time2,logalpha_Time1) 4552 6202
## cor(logalpha_Time0,logalpha_Time1) 6261 6248
## cor(beta_Time0,logalpha_Time2) 7388 6146
## cor(beta_Time1,logalpha_Time2) 5438 5828
## cor(beta_Time2,logalpha_Time2) 6527 6352
## cor(logalpha_Time0,logalpha_Time2) 5600 6135
## cor(logalpha_Time1,logalpha_Time2) 5944 6077
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept 0.54 0.24 0.08 1.01 1.00 1028
## beta_DepButter 0.09 0.08 -0.07 0.24 1.00 7093
## beta_DepArm -0.11 0.09 -0.30 0.06 1.00 3846
## beta_DepShore 0.45 0.08 0.30 0.61 1.00 5793
## beta_DepLetter 0.42 0.06 0.30 0.54 1.00 13324
## beta_DepQueen 0.23 0.07 0.10 0.37 1.00 10710
## beta_DepCabin 0.18 0.07 0.05 0.32 1.00 8003
## beta_DepPole 0.24 0.07 0.11 0.37 1.00 12604
## beta_DepTicket 0.16 0.06 0.04 0.29 1.00 12334
## beta_DepGrass 0.48 0.08 0.33 0.63 1.00 12392
## beta_DepEngine 0.23 0.06 0.12 0.35 1.00 15090
## beta_polyItemPosM121 0.26 2.00 -3.64 4.27 1.00 21265
## beta_polyItemPosM122 0.40 1.96 -3.47 4.22 1.00 18852
## logalpha_Intercept -0.26 0.71 -1.48 1.26 1.00 7248
## logalpha_polyItemPosM121 -0.17 1.00 -2.18 1.84 1.00 17834
## logalpha_polyItemPosM122 0.03 1.00 -1.94 1.95 1.00 20710
## Tail_ESS
## beta_Intercept 1706
## beta_DepButter 5512
## beta_DepArm 4735
## beta_DepShore 5959
## beta_DepLetter 6106
## beta_DepQueen 6254
## beta_DepCabin 5957
## beta_DepPole 5818
## beta_DepTicket 6101
## beta_DepGrass 6081
## beta_DepEngine 6224
## beta_polyItemPosM121 5575
## beta_polyItemPosM122 4806
## logalpha_Intercept 5171
## logalpha_polyItemPosM121 5145
## logalpha_polyItemPosM122 5319
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Serial Position Effect by Trial
mcmc_plot(TwoPL_t3spe, type = "nuts_divergence")mcmc_plot(TwoPL_t3spe, type = "nuts_treedepth")mcmc_plot(TwoPL_t3spe, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_t3spe, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_t3spe, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_t3spe, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_t3spe, type = "neff_hist", binwidth = 0.1)The 2PL model with a unique serial position effect across each trial demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_t3spe, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_t3spe, nsamples = 50, type = "bars")pp_check(TwoPL_t3spe, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_t3spe, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_t3spe)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + poly(ItemPos - 1, 2):Time + (-1 + Time | i | Item)
## logalpha ~ 1 + poly(ItemPos - 1, 2):Time + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.72 0.47 0.12 1.88 1.00 8383 4410
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.29 0.38 0.75 2.19 1.00
## sd(beta_Time1) 0.98 0.27 0.60 1.64 1.00
## sd(beta_Time2) 0.92 0.28 0.53 1.60 1.00
## sd(logalpha_Time0) 1.62 0.39 0.99 2.48 1.00
## sd(logalpha_Time1) 1.02 0.30 0.59 1.74 1.00
## sd(logalpha_Time2) 0.61 0.19 0.34 1.07 1.00
## cor(beta_Time0,beta_Time1) 0.28 0.26 -0.27 0.72 1.00
## cor(beta_Time0,beta_Time2) -0.23 0.26 -0.67 0.30 1.00
## cor(beta_Time1,beta_Time2) 0.27 0.25 -0.27 0.70 1.00
## cor(beta_Time0,logalpha_Time0) -0.08 0.24 -0.54 0.39 1.00
## cor(beta_Time1,logalpha_Time0) -0.00 0.25 -0.48 0.47 1.00
## cor(beta_Time2,logalpha_Time0) 0.28 0.24 -0.22 0.69 1.00
## cor(beta_Time0,logalpha_Time1) 0.01 0.25 -0.48 0.50 1.00
## cor(beta_Time1,logalpha_Time1) 0.02 0.26 -0.48 0.51 1.00
## cor(beta_Time2,logalpha_Time1) 0.18 0.26 -0.33 0.65 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.49 0.22 0.01 0.83 1.00
## cor(beta_Time0,logalpha_Time2) -0.08 0.27 -0.59 0.45 1.00
## cor(beta_Time1,logalpha_Time2) 0.26 0.27 -0.31 0.72 1.00
## cor(beta_Time2,logalpha_Time2) 0.31 0.26 -0.24 0.75 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.15 0.25 -0.35 0.62 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.57 0.22 0.06 0.90 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 1725 2572
## sd(beta_Time1) 3104 4512
## sd(beta_Time2) 1981 3521
## sd(logalpha_Time0) 6025 6133
## sd(logalpha_Time1) 4839 5739
## sd(logalpha_Time2) 3797 5276
## cor(beta_Time0,beta_Time1) 3269 4028
## cor(beta_Time0,beta_Time2) 4368 4683
## cor(beta_Time1,beta_Time2) 3890 5407
## cor(beta_Time0,logalpha_Time0) 4416 5103
## cor(beta_Time1,logalpha_Time0) 5108 5530
## cor(beta_Time2,logalpha_Time0) 3481 4853
## cor(beta_Time0,logalpha_Time1) 5216 5194
## cor(beta_Time1,logalpha_Time1) 5498 5395
## cor(beta_Time2,logalpha_Time1) 4339 5357
## cor(logalpha_Time0,logalpha_Time1) 5544 6467
## cor(beta_Time0,logalpha_Time2) 7332 6378
## cor(beta_Time1,logalpha_Time2) 5843 6119
## cor(beta_Time2,logalpha_Time2) 6440 6471
## cor(logalpha_Time0,logalpha_Time2) 5595 6551
## cor(logalpha_Time1,logalpha_Time2) 5774 6057
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat
## beta_Intercept 0.53 0.24 0.07 1.00 1.00
## beta_DepButter 0.09 0.08 -0.07 0.24 1.00
## beta_DepArm -0.11 0.09 -0.30 0.06 1.00
## beta_DepShore 0.45 0.08 0.29 0.60 1.00
## beta_DepLetter 0.42 0.06 0.31 0.54 1.00
## beta_DepQueen 0.23 0.07 0.09 0.36 1.00
## beta_DepCabin 0.18 0.07 0.05 0.31 1.00
## beta_DepPole 0.24 0.07 0.11 0.37 1.00
## beta_DepTicket 0.16 0.06 0.04 0.28 1.00
## beta_DepGrass 0.48 0.07 0.34 0.63 1.00
## beta_DepEngine 0.23 0.06 0.12 0.35 1.00
## beta_polyItemPosM121:Time0 -0.03 1.98 -3.93 3.84 1.00
## beta_polyItemPosM122:Time0 0.17 1.97 -3.70 4.13 1.00
## beta_polyItemPosM121:Time1 0.08 2.02 -3.81 4.07 1.00
## beta_polyItemPosM122:Time1 0.15 2.00 -3.81 4.16 1.00
## beta_polyItemPosM121:Time2 0.17 2.00 -3.75 4.05 1.00
## beta_polyItemPosM122:Time2 0.09 2.01 -3.91 4.04 1.00
## logalpha_Intercept -0.26 0.72 -1.47 1.30 1.00
## logalpha_polyItemPosM121:Time0 -0.03 1.02 -2.01 1.95 1.00
## logalpha_polyItemPosM122:Time0 -0.02 1.00 -2.01 1.97 1.00
## logalpha_polyItemPosM121:Time1 -0.05 1.00 -2.02 1.92 1.00
## logalpha_polyItemPosM122:Time1 0.02 1.01 -1.95 1.99 1.00
## logalpha_polyItemPosM121:Time2 -0.07 1.01 -2.06 1.91 1.00
## logalpha_polyItemPosM122:Time2 0.00 1.00 -1.95 2.00 1.00
## Bulk_ESS Tail_ESS
## beta_Intercept 1291 2051
## beta_DepButter 6586 6053
## beta_DepArm 4071 5410
## beta_DepShore 5292 6252
## beta_DepLetter 13495 6087
## beta_DepQueen 9999 6521
## beta_DepCabin 6465 5779
## beta_DepPole 12614 5818
## beta_DepTicket 12269 6536
## beta_DepGrass 12300 6218
## beta_DepEngine 14650 5839
## beta_polyItemPosM121:Time0 19993 5095
## beta_polyItemPosM122:Time0 18343 5407
## beta_polyItemPosM121:Time1 17260 6230
## beta_polyItemPosM122:Time1 19210 5005
## beta_polyItemPosM121:Time2 17684 5485
## beta_polyItemPosM122:Time2 18779 4872
## logalpha_Intercept 7949 4706
## logalpha_polyItemPosM121:Time0 17364 5588
## logalpha_polyItemPosM122:Time0 19046 5239
## logalpha_polyItemPosM121:Time1 18812 5238
## logalpha_polyItemPosM122:Time1 18410 5312
## logalpha_polyItemPosM121:Time2 20050 5277
## logalpha_polyItemPosM122:Time2 18592 5775
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Item Covariates
mcmc_plot(TwoPL_itmex, type = "nuts_divergence")mcmc_plot(TwoPL_itmex, type = "nuts_treedepth")mcmc_plot(TwoPL_itmex, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_itmex, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_itmex, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_itmex, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_itmex, type = "neff_hist", binwidth = 0.1)The 2PL model with all item covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. The first fit did encounter a problem with hitting max treedepth in about 1100 iterations. This was addressed by increasing max treedepth from the default 10 to 20. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_itmex, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_itmex, nsamples = 50, type = "bars")pp_check(TwoPL_itmex, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_itmex, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_itmex)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqSTX + Concrete + Density + Diversity + AoA + BOI + Phonemes + Ambiguous + NamingZ + (-1 + Time | i | Item)
## logalpha ~ 1 + FreqSTX + Concrete + Density + Diversity + AoA + BOI + Phonemes + Ambiguous + NamingZ + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.69 0.50 0.08 1.93 1.00 5937 4592
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 0.98 0.41 0.37 1.96 1.00
## sd(beta_Time1) 0.61 0.28 0.22 1.31 1.00
## sd(beta_Time2) 1.06 0.38 0.47 1.96 1.00
## sd(logalpha_Time0) 1.46 0.40 0.82 2.39 1.00
## sd(logalpha_Time1) 0.82 0.29 0.36 1.50 1.00
## sd(logalpha_Time2) 0.60 0.29 0.10 1.26 1.00
## cor(beta_Time0,beta_Time1) -0.03 0.37 -0.72 0.65 1.00
## cor(beta_Time0,beta_Time2) -0.26 0.31 -0.80 0.37 1.00
## cor(beta_Time1,beta_Time2) 0.24 0.32 -0.42 0.79 1.00
## cor(beta_Time0,logalpha_Time0) -0.15 0.28 -0.69 0.39 1.00
## cor(beta_Time1,logalpha_Time0) 0.12 0.30 -0.46 0.69 1.00
## cor(beta_Time2,logalpha_Time0) 0.26 0.26 -0.28 0.73 1.00
## cor(beta_Time0,logalpha_Time1) -0.04 0.30 -0.63 0.51 1.00
## cor(beta_Time1,logalpha_Time1) 0.22 0.32 -0.41 0.78 1.00
## cor(beta_Time2,logalpha_Time1) 0.21 0.29 -0.38 0.73 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.43 0.24 -0.10 0.81 1.00
## cor(beta_Time0,logalpha_Time2) -0.14 0.33 -0.73 0.51 1.00
## cor(beta_Time1,logalpha_Time2) 0.25 0.33 -0.44 0.80 1.00
## cor(beta_Time2,logalpha_Time2) 0.10 0.33 -0.56 0.72 1.00
## cor(logalpha_Time0,logalpha_Time2) -0.01 0.28 -0.56 0.53 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.48 0.27 -0.16 0.88 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 974 1271
## sd(beta_Time1) 893 2184
## sd(beta_Time2) 875 1740
## sd(logalpha_Time0) 4352 5094
## sd(logalpha_Time1) 1374 1168
## sd(logalpha_Time2) 1141 909
## cor(beta_Time0,beta_Time1) 2204 3663
## cor(beta_Time0,beta_Time2) 1563 3249
## cor(beta_Time1,beta_Time2) 1039 2074
## cor(beta_Time0,logalpha_Time0) 1525 1901
## cor(beta_Time1,logalpha_Time0) 1350 2851
## cor(beta_Time2,logalpha_Time0) 2470 3864
## cor(beta_Time0,logalpha_Time1) 2109 2562
## cor(beta_Time1,logalpha_Time1) 1692 3571
## cor(beta_Time2,logalpha_Time1) 2737 4352
## cor(logalpha_Time0,logalpha_Time1) 5256 6195
## cor(beta_Time0,logalpha_Time2) 3336 3797
## cor(beta_Time1,logalpha_Time2) 3297 4336
## cor(beta_Time2,logalpha_Time2) 3712 5248
## cor(logalpha_Time0,logalpha_Time2) 4738 5176
## cor(logalpha_Time1,logalpha_Time2) 3092 2792
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept 0.43 1.27 -2.07 2.93 1.00 3194 4780
## beta_DepButter 0.03 0.08 -0.13 0.19 1.00 3091 4861
## beta_DepArm -0.10 0.09 -0.28 0.08 1.00 2167 3725
## beta_DepShore 0.45 0.08 0.29 0.60 1.00 3955 5605
## beta_DepLetter 0.42 0.06 0.31 0.54 1.00 6814 5760
## beta_DepQueen 0.25 0.07 0.12 0.39 1.00 5940 6103
## beta_DepCabin 0.22 0.07 0.09 0.36 1.00 4583 5579
## beta_DepPole 0.27 0.07 0.14 0.41 1.00 4509 5836
## beta_DepTicket 0.16 0.06 0.04 0.28 1.00 7110 5620
## beta_DepGrass 0.49 0.07 0.34 0.63 1.00 8125 6291
## beta_DepEngine 0.23 0.06 0.12 0.35 1.00 8283 6113
## beta_FreqSTX -0.40 0.29 -0.88 0.23 1.00 755 1996
## beta_Concrete -0.53 0.21 -0.95 -0.09 1.00 1614 2283
## beta_Density 0.15 1.73 -3.25 3.53 1.00 5654 5754
## beta_Diversity -0.24 0.23 -0.70 0.24 1.00 1048 2104
## beta_AoA -0.37 0.29 -0.89 0.23 1.00 938 2352
## beta_BOI 0.68 0.22 0.25 1.10 1.00 1311 2708
## beta_Phonemes 0.00 0.19 -0.38 0.41 1.00 1685 3051
## beta_Ambiguous 0.13 0.33 -0.52 0.81 1.01 1445 2434
## beta_NamingZ 0.33 1.50 -2.71 3.26 1.00 1673 2463
## logalpha_Intercept -0.31 0.82 -1.87 1.35 1.00 6305 5247
## logalpha_FreqSTX 0.41 0.37 -0.25 1.23 1.00 1376 1030
## logalpha_Concrete -0.22 0.28 -0.78 0.32 1.00 1835 3167
## logalpha_Density -0.23 0.94 -2.05 1.61 1.00 7705 5456
## logalpha_Diversity -0.66 0.36 -1.45 0.01 1.00 1156 1195
## logalpha_AoA -0.34 0.34 -0.99 0.35 1.00 1307 1362
## logalpha_BOI -0.07 0.32 -0.76 0.52 1.00 1263 1050
## logalpha_Phonemes -0.13 0.27 -0.70 0.42 1.00 1309 1736
## logalpha_Ambiguous 0.95 0.52 0.06 2.17 1.00 1096 916
## logalpha_NamingZ 0.70 0.92 -1.11 2.51 1.00 5617 5769
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Item Covariates by Trial
mcmc_plot(TwoPL_itmcp, type = "nuts_divergence")mcmc_plot(TwoPL_itmcp, type = "nuts_treedepth")mcmc_plot(TwoPL_itmcp, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_itmcp, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_itmcp, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_itmcp, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_itmcp, type = "neff_hist", binwidth = 0.1)Under the default conditions, the 2PL model with item covariates varying across trials produced a single divergent transition after warmup. After increasing adapt delta to 0.85, the model demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_itmcp, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_itmcp, nsamples = 50, type = "bars")pp_check(TwoPL_itmcp, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_itmcp, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_itmcp)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqSTX:Time + Concrete:Time + Density:Time + Diversity:Time + AoA:Time + BOI:Time + Phonemes:Time + Ambiguous:Time + NamingZ:Time + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.74 0.48 0.12 1.92 1.00 8482 5077
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 0.84 0.51 0.29 2.21 1.00
## sd(beta_Time1) 0.78 0.50 0.25 2.09 1.00
## sd(beta_Time2) 0.96 0.55 0.35 2.43 1.00
## sd(logalpha_Time0) 1.56 0.39 0.95 2.46 1.00
## sd(logalpha_Time1) 0.95 0.27 0.55 1.60 1.00
## sd(logalpha_Time2) 0.62 0.19 0.35 1.07 1.00
## cor(beta_Time0,beta_Time1) -0.13 0.36 -0.77 0.57 1.00
## cor(beta_Time0,beta_Time2) -0.18 0.35 -0.79 0.54 1.00
## cor(beta_Time1,beta_Time2) 0.19 0.36 -0.53 0.81 1.00
## cor(beta_Time0,logalpha_Time0) -0.29 0.31 -0.82 0.34 1.00
## cor(beta_Time1,logalpha_Time0) 0.20 0.30 -0.40 0.75 1.00
## cor(beta_Time2,logalpha_Time0) 0.28 0.30 -0.34 0.80 1.00
## cor(beta_Time0,logalpha_Time1) -0.22 0.30 -0.74 0.39 1.00
## cor(beta_Time1,logalpha_Time1) 0.29 0.31 -0.36 0.81 1.00
## cor(beta_Time2,logalpha_Time1) 0.29 0.30 -0.34 0.81 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.50 0.22 0.01 0.85 1.00
## cor(beta_Time0,logalpha_Time2) -0.06 0.32 -0.65 0.56 1.00
## cor(beta_Time1,logalpha_Time2) 0.23 0.33 -0.44 0.78 1.00
## cor(beta_Time2,logalpha_Time2) 0.18 0.32 -0.46 0.74 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.15 0.25 -0.35 0.63 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.58 0.22 0.07 0.90 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 2294 3504
## sd(beta_Time1) 1928 2735
## sd(beta_Time2) 2350 3575
## sd(logalpha_Time0) 8158 6761
## sd(logalpha_Time1) 5720 6319
## sd(logalpha_Time2) 5774 5696
## cor(beta_Time0,beta_Time1) 6431 5608
## cor(beta_Time0,beta_Time2) 6132 6336
## cor(beta_Time1,beta_Time2) 5529 6265
## cor(beta_Time0,logalpha_Time0) 2559 3798
## cor(beta_Time1,logalpha_Time0) 3281 5031
## cor(beta_Time2,logalpha_Time0) 2845 5349
## cor(beta_Time0,logalpha_Time1) 2595 3735
## cor(beta_Time1,logalpha_Time1) 2675 4203
## cor(beta_Time2,logalpha_Time1) 3592 5415
## cor(logalpha_Time0,logalpha_Time1) 8222 6865
## cor(beta_Time0,logalpha_Time2) 2662 5081
## cor(beta_Time1,logalpha_Time2) 2890 4239
## cor(beta_Time2,logalpha_Time2) 4168 5783
## cor(logalpha_Time0,logalpha_Time2) 8663 7221
## cor(logalpha_Time1,logalpha_Time2) 7299 7178
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept 0.46 0.88 -1.25 2.23 1.00 6631
## beta_DepButter 0.07 0.08 -0.09 0.22 1.00 7138
## beta_DepArm -0.12 0.09 -0.32 0.05 1.00 4158
## beta_DepShore 0.46 0.08 0.31 0.61 1.00 7041
## beta_DepLetter 0.41 0.06 0.29 0.53 1.00 14625
## beta_DepQueen 0.23 0.07 0.10 0.37 1.00 12981
## beta_DepCabin 0.20 0.07 0.07 0.33 1.00 7984
## beta_DepPole 0.23 0.07 0.10 0.36 1.00 14848
## beta_DepTicket 0.15 0.06 0.03 0.28 1.00 13921
## beta_DepGrass 0.47 0.08 0.32 0.62 1.00 18416
## beta_DepEngine 0.23 0.06 0.12 0.35 1.00 16121
## beta_FreqSTX:Time0 -0.42 0.39 -1.23 0.40 1.00 3742
## beta_FreqSTX:Time1 -0.66 0.37 -1.36 0.20 1.00 3733
## beta_FreqSTX:Time2 0.16 0.44 -0.75 1.10 1.00 3525
## beta_Time0:Concrete -0.57 0.45 -1.43 0.46 1.00 3366
## beta_Time1:Concrete -0.35 0.44 -1.23 0.55 1.00 3434
## beta_Time2:Concrete -0.40 0.49 -1.41 0.64 1.00 3996
## beta_Time0:Density -0.34 1.52 -3.39 2.68 1.00 8284
## beta_Time1:Density 0.20 1.52 -2.81 3.17 1.00 7837
## beta_Time2:Density 0.34 1.51 -2.65 3.31 1.00 7767
## beta_Time0:Diversity -0.59 0.44 -1.44 0.41 1.00 2781
## beta_Time1:Diversity 0.08 0.43 -0.79 0.98 1.00 3467
## beta_Time2:Diversity 0.11 0.50 -0.93 1.17 1.00 3920
## beta_Time0:AoA -0.24 0.47 -1.19 0.76 1.00 3021
## beta_Time1:AoA -0.52 0.46 -1.40 0.48 1.00 2935
## beta_Time2:AoA 0.09 0.53 -1.02 1.16 1.00 3844
## beta_Time0:BOI 0.90 0.41 -0.02 1.72 1.00 4022
## beta_Time1:BOI 0.54 0.39 -0.29 1.31 1.00 4207
## beta_Time2:BOI 0.46 0.46 -0.52 1.41 1.00 4603
## beta_Time0:Phonemes -0.10 0.43 -0.99 0.82 1.00 3211
## beta_Time1:Phonemes 0.09 0.41 -0.79 0.93 1.00 3491
## beta_Time2:Phonemes 0.10 0.49 -0.94 1.12 1.00 3531
## beta_Time0:Ambiguous 0.18 0.61 -1.17 1.39 1.00 3674
## beta_Time1:Ambiguous -0.11 0.58 -1.22 1.13 1.00 3674
## beta_Time2:Ambiguous 0.11 0.67 -1.26 1.52 1.00 4154
## beta_Time0:NamingZ 0.90 1.57 -2.20 3.93 1.00 4691
## beta_Time1:NamingZ -0.09 1.53 -3.11 2.85 1.00 6627
## beta_Time2:NamingZ -0.39 1.57 -3.47 2.72 1.00 7655
## logalpha_Intercept -0.22 0.71 -1.45 1.31 1.00 8736
## Tail_ESS
## beta_Intercept 5756
## beta_DepButter 6334
## beta_DepArm 5596
## beta_DepShore 6102
## beta_DepLetter 6442
## beta_DepQueen 6443
## beta_DepCabin 6441
## beta_DepPole 5960
## beta_DepTicket 6317
## beta_DepGrass 5768
## beta_DepEngine 5463
## beta_FreqSTX:Time0 3415
## beta_FreqSTX:Time1 3097
## beta_FreqSTX:Time2 3486
## beta_Time0:Concrete 3623
## beta_Time1:Concrete 3436
## beta_Time2:Concrete 3704
## beta_Time0:Density 6197
## beta_Time1:Density 6200
## beta_Time2:Density 6326
## beta_Time0:Diversity 3026
## beta_Time1:Diversity 3458
## beta_Time2:Diversity 4139
## beta_Time0:AoA 3574
## beta_Time1:AoA 3082
## beta_Time2:AoA 4068
## beta_Time0:BOI 4037
## beta_Time1:BOI 3706
## beta_Time2:BOI 3981
## beta_Time0:Phonemes 3336
## beta_Time1:Phonemes 2884
## beta_Time2:Phonemes 3777
## beta_Time0:Ambiguous 3490
## beta_Time1:Ambiguous 3895
## beta_Time2:Ambiguous 4181
## beta_Time0:NamingZ 6093
## beta_Time1:NamingZ 5989
## beta_Time2:NamingZ 6384
## logalpha_Intercept 4760
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Item Covariates with Unique Interactions
mcmc_plot(TwoPL_itmcr, type = "nuts_divergence")mcmc_plot(TwoPL_itmcr, type = "nuts_treedepth")mcmc_plot(TwoPL_itmcr, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_itmcr, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_itmcr, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_itmcr, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_itmcr, type = "neff_hist", binwidth = 0.1)The 2PL model with collapsed trial interactions for item covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_itmcr, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_itmcr, nsamples = 50, type = "bars")pp_check(TwoPL_itmcr, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_itmcr, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_itmcr)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqHAL:Trial23 + Concrete:Trial23 + Diversity:Trial23 + AoA:Trial12 + BOI:Trial23 + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.72 0.47 0.12 1.86 1.00 7562 4929
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.24 0.44 0.65 2.33 1.00
## sd(beta_Time1) 0.35 0.22 0.07 0.92 1.00
## sd(beta_Time2) 0.90 0.31 0.46 1.66 1.00
## sd(logalpha_Time0) 1.61 0.39 0.98 2.48 1.00
## sd(logalpha_Time1) 1.00 0.29 0.57 1.67 1.00
## sd(logalpha_Time2) 0.63 0.19 0.35 1.09 1.00
## cor(beta_Time0,beta_Time1) 0.10 0.36 -0.57 0.75 1.00
## cor(beta_Time0,beta_Time2) -0.49 0.30 -0.92 0.19 1.00
## cor(beta_Time1,beta_Time2) -0.11 0.36 -0.74 0.60 1.00
## cor(beta_Time0,logalpha_Time0) -0.12 0.25 -0.59 0.36 1.00
## cor(beta_Time1,logalpha_Time0) -0.26 0.29 -0.77 0.33 1.00
## cor(beta_Time2,logalpha_Time0) 0.18 0.25 -0.32 0.63 1.00
## cor(beta_Time0,logalpha_Time1) -0.04 0.26 -0.54 0.46 1.00
## cor(beta_Time1,logalpha_Time1) 0.00 0.29 -0.57 0.56 1.00
## cor(beta_Time2,logalpha_Time1) 0.17 0.27 -0.36 0.68 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.51 0.22 0.01 0.85 1.00
## cor(beta_Time0,logalpha_Time2) -0.12 0.28 -0.64 0.42 1.00
## cor(beta_Time1,logalpha_Time2) 0.15 0.31 -0.48 0.70 1.00
## cor(beta_Time2,logalpha_Time2) 0.21 0.28 -0.35 0.70 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.19 0.25 -0.31 0.64 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.59 0.22 0.06 0.91 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 2515 3789
## sd(beta_Time1) 1123 1921
## sd(beta_Time2) 1766 2882
## sd(logalpha_Time0) 6649 5889
## sd(logalpha_Time1) 4630 5681
## sd(logalpha_Time2) 3861 4952
## cor(beta_Time0,beta_Time1) 3326 4473
## cor(beta_Time0,beta_Time2) 2144 3743
## cor(beta_Time1,beta_Time2) 3079 4115
## cor(beta_Time0,logalpha_Time0) 4146 5058
## cor(beta_Time1,logalpha_Time0) 1741 3437
## cor(beta_Time2,logalpha_Time0) 3440 4517
## cor(beta_Time0,logalpha_Time1) 4171 5221
## cor(beta_Time1,logalpha_Time1) 2186 2879
## cor(beta_Time2,logalpha_Time1) 2938 3786
## cor(logalpha_Time0,logalpha_Time1) 6876 6184
## cor(beta_Time0,logalpha_Time2) 5598 5503
## cor(beta_Time1,logalpha_Time2) 2587 3960
## cor(beta_Time2,logalpha_Time2) 4298 6321
## cor(logalpha_Time0,logalpha_Time2) 6686 6637
## cor(logalpha_Time1,logalpha_Time2) 5921 6744
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept 0.51 0.14 0.23 0.80 1.00 1343
## beta_DepButter 0.04 0.08 -0.12 0.19 1.00 3429
## beta_DepArm -0.09 0.08 -0.27 0.07 1.00 3475
## beta_DepShore 0.45 0.07 0.31 0.59 1.00 3903
## beta_DepLetter 0.41 0.06 0.29 0.52 1.00 7083
## beta_DepQueen 0.24 0.07 0.11 0.38 1.00 9626
## beta_DepCabin 0.21 0.06 0.08 0.33 1.00 6833
## beta_DepPole 0.24 0.06 0.12 0.37 1.00 8257
## beta_DepTicket 0.14 0.06 0.02 0.26 1.00 7260
## beta_DepGrass 0.48 0.07 0.34 0.62 1.00 8814
## beta_DepEngine 0.27 0.06 0.15 0.39 1.00 5415
## beta_FreqHAL:Trial230 0.07 0.47 -0.87 1.01 1.00 1795
## beta_FreqHAL:Trial231 -0.49 0.18 -0.77 -0.06 1.00 1261
## beta_Trial230:Concrete -0.48 0.51 -1.49 0.62 1.00 2659
## beta_Trial231:Concrete -0.54 0.18 -0.89 -0.17 1.00 2246
## beta_Trial230:Diversity -0.61 0.42 -1.45 0.23 1.00 2742
## beta_Trial231:Diversity -0.01 0.16 -0.30 0.34 1.00 1951
## beta_AoA:Trial120 -0.03 0.29 -0.64 0.51 1.00 2748
## beta_AoA:Trial121 -0.33 0.18 -0.63 0.07 1.00 1945
## beta_Trial230:BOI 0.78 0.50 -0.31 1.77 1.00 2504
## beta_Trial231:BOI 0.74 0.18 0.36 1.08 1.00 2098
## logalpha_Intercept -0.25 0.71 -1.48 1.32 1.00 7268
## Tail_ESS
## beta_Intercept 1677
## beta_DepButter 5123
## beta_DepArm 4481
## beta_DepShore 6092
## beta_DepLetter 6211
## beta_DepQueen 6419
## beta_DepCabin 6412
## beta_DepPole 6195
## beta_DepTicket 6307
## beta_DepGrass 5771
## beta_DepEngine 4417
## beta_FreqHAL:Trial230 3412
## beta_FreqHAL:Trial231 2061
## beta_Trial230:Concrete 3667
## beta_Trial231:Concrete 3140
## beta_Trial230:Diversity 3746
## beta_Trial231:Diversity 3357
## beta_AoA:Trial120 4376
## beta_AoA:Trial121 2343
## beta_Trial230:BOI 3735
## beta_Trial231:BOI 2696
## logalpha_Intercept 4923
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Reduced Item Covariates Model
mcmc_plot(TwoPL_itmsd, type = "nuts_divergence")mcmc_plot(TwoPL_itmsd, type = "nuts_treedepth")mcmc_plot(TwoPL_itmsd, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_itmsd, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_itmsd, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_itmsd, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_itmsd, type = "neff_hist", binwidth = 0.1)The 2PL model with the reduced item covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_itmsd, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_itmsd, nsamples = 50, type = "bars")pp_check(TwoPL_itmsd, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_itmsd, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_itmsd)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqHAL:Trial23 + Concrete + BOI + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.73 0.48 0.11 1.90 1.00 7194 4996
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.20 0.37 0.69 2.10 1.00
## sd(beta_Time1) 0.63 0.23 0.31 1.19 1.00
## sd(beta_Time2) 0.70 0.26 0.36 1.35 1.00
## sd(logalpha_Time0) 1.63 0.39 0.99 2.52 1.00
## sd(logalpha_Time1) 1.02 0.30 0.58 1.74 1.00
## sd(logalpha_Time2) 0.62 0.19 0.35 1.07 1.00
## cor(beta_Time0,beta_Time1) 0.05 0.31 -0.54 0.64 1.00
## cor(beta_Time0,beta_Time2) -0.52 0.25 -0.88 0.07 1.00
## cor(beta_Time1,beta_Time2) -0.10 0.30 -0.63 0.50 1.00
## cor(beta_Time0,logalpha_Time0) -0.12 0.24 -0.57 0.37 1.00
## cor(beta_Time1,logalpha_Time0) -0.07 0.26 -0.55 0.44 1.00
## cor(beta_Time2,logalpha_Time0) 0.27 0.25 -0.26 0.72 1.00
## cor(beta_Time0,logalpha_Time1) 0.08 0.26 -0.44 0.55 1.00
## cor(beta_Time1,logalpha_Time1) 0.07 0.26 -0.45 0.57 1.00
## cor(beta_Time2,logalpha_Time1) 0.31 0.27 -0.26 0.78 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.50 0.22 0.01 0.84 1.00
## cor(beta_Time0,logalpha_Time2) -0.08 0.27 -0.58 0.45 1.00
## cor(beta_Time1,logalpha_Time2) 0.30 0.26 -0.26 0.74 1.00
## cor(beta_Time2,logalpha_Time2) 0.29 0.26 -0.26 0.74 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.17 0.25 -0.33 0.63 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.57 0.22 0.05 0.90 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 1769 3143
## sd(beta_Time1) 2063 3750
## sd(beta_Time2) 1044 2187
## sd(logalpha_Time0) 5892 5679
## sd(logalpha_Time1) 4496 6054
## sd(logalpha_Time2) 4259 5653
## cor(beta_Time0,beta_Time1) 2301 4152
## cor(beta_Time0,beta_Time2) 3094 4856
## cor(beta_Time1,beta_Time2) 2539 3558
## cor(beta_Time0,logalpha_Time0) 3350 5421
## cor(beta_Time1,logalpha_Time0) 3208 5456
## cor(beta_Time2,logalpha_Time0) 1874 3070
## cor(beta_Time0,logalpha_Time1) 3720 4426
## cor(beta_Time1,logalpha_Time1) 4428 4968
## cor(beta_Time2,logalpha_Time1) 2243 5077
## cor(logalpha_Time0,logalpha_Time1) 7382 6238
## cor(beta_Time0,logalpha_Time2) 7321 6173
## cor(beta_Time1,logalpha_Time2) 4670 6106
## cor(beta_Time2,logalpha_Time2) 5016 5872
## cor(logalpha_Time0,logalpha_Time2) 6688 6748
## cor(logalpha_Time1,logalpha_Time2) 5772 6418
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept 0.57 0.17 0.22 0.89 1.01 818
## beta_DepButter 0.06 0.08 -0.09 0.21 1.00 4585
## beta_DepArm -0.11 0.09 -0.28 0.07 1.00 3654
## beta_DepShore 0.44 0.08 0.29 0.59 1.00 4416
## beta_DepLetter 0.42 0.06 0.31 0.53 1.00 11617
## beta_DepQueen 0.25 0.07 0.11 0.39 1.00 8715
## beta_DepCabin 0.21 0.07 0.08 0.33 1.00 6436
## beta_DepPole 0.23 0.06 0.10 0.36 1.00 9885
## beta_DepTicket 0.14 0.06 0.02 0.26 1.00 8377
## beta_DepGrass 0.49 0.07 0.36 0.63 1.00 8994
## beta_DepEngine 0.23 0.05 0.12 0.34 1.00 11609
## beta_Concrete -0.47 0.13 -0.73 -0.21 1.00 2209
## beta_BOI 0.69 0.14 0.44 0.98 1.00 1608
## beta_FreqHAL:Trial230 -0.10 0.37 -0.85 0.63 1.00 1478
## beta_FreqHAL:Trial231 -0.23 0.18 -0.57 0.12 1.00 1226
## logalpha_Intercept -0.26 0.73 -1.51 1.36 1.00 6666
## Tail_ESS
## beta_Intercept 1205
## beta_DepButter 5451
## beta_DepArm 5400
## beta_DepShore 5102
## beta_DepLetter 6071
## beta_DepQueen 6027
## beta_DepCabin 6217
## beta_DepPole 6012
## beta_DepTicket 6194
## beta_DepGrass 6173
## beta_DepEngine 6447
## beta_Concrete 2643
## beta_BOI 2093
## beta_FreqHAL:Trial230 2432
## beta_FreqHAL:Trial231 2340
## logalpha_Intercept 5130
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Final Item Covariate Model
mcmc_plot(TwoPL_itmfn, type = "nuts_divergence")mcmc_plot(TwoPL_itmfn, type = "nuts_treedepth")mcmc_plot(TwoPL_itmfn, type = "trace", regex_pars = "b_")## No divergences to plot.
mcmc_plot(TwoPL_itmfn, type = "trace", regex_pars = "sd_")## No divergences to plot.
mcmc_plot(TwoPL_itmfn, type = "trace", regex_pars = "cor_")## No divergences to plot.
mcmc_plot(TwoPL_itmfn, type = "rhat_hist", binwidth = 0.0001)mcmc_plot(TwoPL_itmfn, type = "neff_hist", binwidth = 0.1)The 2PL model with final reduction in the covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.
mcmc_plot(TwoPL_itmfn, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")pp_check(TwoPL_itmfn, nsamples = 50, type = "bars")pp_check(TwoPL_itmfn, nsamples = 50, type = "bars_grouped", group = "Item")pp_check(TwoPL_itmfn, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.
summary(TwoPL_itmfn)## Family: bernoulli
## Links: mu = logit
## Formula: Resp ~ beta + exp(logalpha) * theta
## theta ~ 0 + (1 | ID)
## beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqHAL:Trial23 + AoA:Trial12 + BOI + Concrete + (-1 + Time | i | Item)
## logalpha ~ 1 + (-1 + Time | i | Item)
## Data: df_long (Number of observations: 36570)
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~ID (Number of levels: 1219)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept) 0.73 0.47 0.12 1.91 1.00 7412 5016
##
## ~Item (Number of levels: 10)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0) 1.40 0.40 0.81 2.34 1.00
## sd(beta_Time1) 0.32 0.20 0.06 0.84 1.00
## sd(beta_Time2) 0.89 0.29 0.44 1.58 1.00
## sd(logalpha_Time0) 1.61 0.40 0.98 2.51 1.00
## sd(logalpha_Time1) 1.02 0.29 0.59 1.70 1.00
## sd(logalpha_Time2) 0.61 0.19 0.34 1.09 1.00
## cor(beta_Time0,beta_Time1) 0.18 0.35 -0.51 0.80 1.00
## cor(beta_Time0,beta_Time2) -0.52 0.29 -0.91 0.17 1.00
## cor(beta_Time1,beta_Time2) -0.15 0.36 -0.76 0.58 1.00
## cor(beta_Time0,logalpha_Time0) -0.09 0.23 -0.54 0.36 1.00
## cor(beta_Time1,logalpha_Time0) -0.23 0.29 -0.76 0.36 1.00
## cor(beta_Time2,logalpha_Time0) 0.19 0.24 -0.29 0.62 1.00
## cor(beta_Time0,logalpha_Time1) 0.06 0.25 -0.44 0.53 1.00
## cor(beta_Time1,logalpha_Time1) 0.03 0.28 -0.53 0.55 1.00
## cor(beta_Time2,logalpha_Time1) 0.20 0.26 -0.32 0.66 1.00
## cor(logalpha_Time0,logalpha_Time1) 0.51 0.21 0.03 0.84 1.00
## cor(beta_Time0,logalpha_Time2) -0.14 0.27 -0.63 0.42 1.00
## cor(beta_Time1,logalpha_Time2) 0.17 0.30 -0.44 0.71 1.00
## cor(beta_Time2,logalpha_Time2) 0.24 0.28 -0.34 0.72 1.00
## cor(logalpha_Time0,logalpha_Time2) 0.19 0.25 -0.31 0.65 1.00
## cor(logalpha_Time1,logalpha_Time2) 0.58 0.23 0.05 0.91 1.00
## Bulk_ESS Tail_ESS
## sd(beta_Time0) 2433 3468
## sd(beta_Time1) 1195 1616
## sd(beta_Time2) 1625 2354
## sd(logalpha_Time0) 6103 6517
## sd(logalpha_Time1) 4922 6086
## sd(logalpha_Time2) 5038 6050
## cor(beta_Time0,beta_Time1) 3586 4992
## cor(beta_Time0,beta_Time2) 1817 3708
## cor(beta_Time1,beta_Time2) 2395 4081
## cor(beta_Time0,logalpha_Time0) 4537 5640
## cor(beta_Time1,logalpha_Time0) 1841 3291
## cor(beta_Time2,logalpha_Time0) 3460 4079
## cor(beta_Time0,logalpha_Time1) 5202 5551
## cor(beta_Time1,logalpha_Time1) 2920 4875
## cor(beta_Time2,logalpha_Time1) 3622 4573
## cor(logalpha_Time0,logalpha_Time1) 6797 6956
## cor(beta_Time0,logalpha_Time2) 6741 6356
## cor(beta_Time1,logalpha_Time2) 3391 5140
## cor(beta_Time2,logalpha_Time2) 4763 6157
## cor(logalpha_Time0,logalpha_Time2) 6863 5976
## cor(logalpha_Time1,logalpha_Time2) 5907 6416
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept 0.55 0.12 0.33 0.81 1.00 1491
## beta_DepButter 0.05 0.07 -0.08 0.19 1.00 3452
## beta_DepArm -0.10 0.08 -0.27 0.06 1.00 3395
## beta_DepShore 0.44 0.07 0.30 0.58 1.00 3522
## beta_DepLetter 0.40 0.06 0.28 0.52 1.00 6632
## beta_DepQueen 0.25 0.07 0.12 0.39 1.00 8389
## beta_DepCabin 0.20 0.06 0.08 0.33 1.00 5886
## beta_DepPole 0.25 0.06 0.12 0.37 1.00 7173
## beta_DepTicket 0.14 0.06 0.02 0.25 1.00 5037
## beta_DepGrass 0.47 0.07 0.34 0.61 1.00 6971
## beta_DepEngine 0.27 0.06 0.15 0.39 1.00 4899
## beta_BOI 0.70 0.12 0.46 0.94 1.00 2220
## beta_Concrete -0.49 0.11 -0.71 -0.26 1.00 2832
## beta_FreqHAL:Trial230 0.04 0.48 -0.91 0.94 1.01 1130
## beta_FreqHAL:Trial231 -0.47 0.17 -0.74 -0.07 1.00 1148
## beta_AoA:Trial120 0.10 0.30 -0.53 0.66 1.00 2139
## beta_AoA:Trial121 -0.34 0.15 -0.63 0.00 1.00 2101
## logalpha_Intercept -0.24 0.71 -1.45 1.32 1.00 7352
## Tail_ESS
## beta_Intercept 2004
## beta_DepButter 5075
## beta_DepArm 4799
## beta_DepShore 4761
## beta_DepLetter 5113
## beta_DepQueen 5696
## beta_DepCabin 6414
## beta_DepPole 5788
## beta_DepTicket 5464
## beta_DepGrass 5961
## beta_DepEngine 3370
## beta_BOI 3000
## beta_Concrete 3392
## beta_FreqHAL:Trial230 2169
## beta_FreqHAL:Trial231 1875
## beta_AoA:Trial120 4035
## beta_AoA:Trial121 2297
## logalpha_Intercept 4750
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Model Comparisons
Since not all models were compared to one another, the results shown here are perhaps better contextualized by viewing it in the context of the modeling flowchart. To help with understanding the modeling choices, some discussion of how the model comparison results were interpreted and used to inform the next model fit is provided after each comparison. Again, the comparison of models was based on a combination of the LOOIC and pseudo-BMA. For those unfamiliar with these methods, some brief description of these criteria is provided here.
The LOOIC (leave-one-out cross-validation information criterion) serves a similar function as other information criterion like the AIC or BIC. The LOOIC aims to estimate the expected log predictive density of a model fit on new data, or framed another way, the expected model performance in out-of-sample predictions. The computation of the LOOIC assumes that any single observation can be omitted without significantly altering the posterior distribution, which is where the idea of a leave-one-out criterion. Comparing LOOIC values thus directly compares models for their ability to predict out-of-sample estimates. Like other information criteria, a lower LOOIC corresponds to better model performance. An important caveat to the LOOIC is that brms and related Stan programs (e.g., loo) favor reporting the actual expected log predictive distribution (ELPD) differences rather than the more traditional information criterion statistic (-2*ELPD). The multiplication of the ELPD by -2 is done for historic reasons as this simple transformation causes the information criterion for follow a chi-squared distribution, making it possible to conduct likelihood ratio tests. In a Bayesian framework, access to the posterior permits direct comparison (e.g., \(ELPD_{Model1}-ELPD_{Model2}\)) since the standard error of the posterior of this difference can also be computed. Using that standard error, a “significant” difference in the model’s performance can be found by \(ELPD_{Diff} > 1.96\times{SE}\), or more informally by checking whether the difference in ELPD is at least twice as large as the standard error.
The pseudo-BMA (Bayesian model averaging) is a form of model stacking/weighting/averaging. In this particular framework, the aim of the pseudo-BMA method is to weight models by their expected log predictive distribution. A model given greater weight would be preferred while models sharing similar weights would suggest non-superiority of any model.
ModelComparisons <- readRDS("ModelComparisons.rds")
matrix(c(ModelComparisons$LOOIC$Comparison.1[, 1:2],
ModelComparisons$BMAs$Comparison.1[2:1]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("2PL", "Rasch"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Rasch and 2PL Intercept Models", digits = c(2, 2, 59), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| 2PL | 0.00 | 0.00 | 1.00e+00 |
| Rasch | -191.97 | 24.23 | 9.92e-57 |
In the comparison of the Rasch and 2PL models, both the LOOIC and pseudo-BMA model weight agree that the 2PL model is preferable. As a quick clarification note, it may appear at first glance that the Rasch model had a smaller LOOIC since the ELPD difference listed for it is negative. It is important to remember that this difference always reflects the difference between the smaller ELPD from the larger ELPD. Since this is the arrangement, the negative value in the difference reflects that the 2PL model had a smaller ELPD than the Rasch model in the same way that the difference between 5 and 10 would be -5. The ELPD difference is about 8 times larger than the standard error of that difference, which supports that the difference is real. This is congruent with the pseudo-BMA weights wherein the 2PL is given essentially all the weight in the comparison of the two models.
matrix(c(ModelComparisons$LOOIC$Comparison.2[, 1:2],
ModelComparisons$BMAs$Comparison.2[sapply(row.names(ModelComparisons$LOOIC$Comparison.2), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.2)$names)})]),
nrow = 5, ncol = 3, byrow = FALSE,
dimnames = list(c("Varying Trials, Fixed Intercept", "Varying Trials, Fixed and Random Intercept", "Trial 1, Fixed Intercept", "Trial 1, Fixed and Random Intercepts", "Fixed Trials, Fixed and Random Intercept"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the 2PL Intercept Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Varying Trials, Fixed Intercept | 0.00 | 0.00 | 0.98 |
| Varying Trials, Fixed and Random Intercept | -6.15 | 2.26 | 0.02 |
| Trial 1, Fixed Intercept | -354.86 | 26.46 | 0.00 |
| Trial 1, Fixed and Random Intercepts | -355.25 | 26.48 | 0.00 |
| Fixed Trials, Fixed and Random Intercept | -2050.21 | 61.17 | 0.00 |
Comparing the various specifications of the intercept-only 2PL models also demonstrates clear concordance of the LOOIC and pseudo-BMA model weights. A model that allows items to vary over all three trials and specifies only a fixed (rather than a fixed and random) intercept for both difficulty and discrimination is the most preferred model.
matrix(c(ModelComparisons$LOOIC$Comparison.3[, 1:2],
ModelComparisons$BMAs$Comparison.3[sapply(row.names(ModelComparisons$LOOIC$Comparison.3), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.3)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "Varying Trials, Fixed Intercept"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the 2PL Intercept and Local Dependency Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.00 | 0.00 | 0.95 |
| Varying Trials, Fixed Intercept | -26.96 | 15.92 | 0.05 |
Comparing the preferred intercept model to one that includes the dependency of recalling items over trials, the LOOIC and pseudo-BMA weights support preferring the local dependency effect.
matrix(c(ModelComparisons$LOOIC$Comparison.4[, 1:2],
ModelComparisons$BMAs$Comparison.4[sapply(row.names(ModelComparisons$LOOIC$Comparison.4), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.4)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Dependency on Easiness Only", "Local Dependency"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency Model on both Parameters vs Easiness Only", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Dependency on Easiness Only | 0.00 | 0.00 | 0.99 |
| Local Dependency | -12.92 | 5.88 | 0.01 |
Dropping the local dependency effect on items’ discrimination parameter resulted in a preferred model by both the LOOIC and pseudo-BMA model weights. This indicates that items’ discrimination varies across trials in the same manner regardless of whether a word was recalled earlier or not, suggesting that the item’s position is more important for determining its discrimination. In contrast, recalling a word in an earlier trial does impact its easiness at a later trial.
matrix(c(ModelComparisons$LOOIC$Comparison.5[, 1:2],
ModelComparisons$BMAs$Comparison.5[sapply(row.names(ModelComparisons$LOOIC$Comparison.5), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.5)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Easiness Only", "Uniform Dependency"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Item-specific vs Uniform Local Dependency Models", digits = c(2, 2, 14), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Easiness Only | 0.00 | 0.00 | 1.00e+00 |
| Uniform Dependency | -44.97 | 8.69 | 3.03e-12 |
Assuming a uniform dependency effect across all words resulted in poorer model performance per both the LOOIC and pseudo-BMA weights. This result indicates that there is a unique interaction between prior learning and the words themselves. If it was just the case that learning a word on a prior trial causes a certain change in its likelihood of being recalled later, then the uniform dependency effect would have been preferred. It appears instead that some words benefit from being learned earlier while others do not. This is consistent with the fact that not all of the word-specific local dependency effects are credibly different from zero in effect size.
matrix(c(ModelComparisons$LOOIC$Comparison.6[, 1:2],
ModelComparisons$BMAs$Comparison.6[sapply(row.names(ModelComparisons$LOOIC$Comparison.6), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.6)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "Learning/Growth"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency and Learning Models", digits = c(2, 2, 3), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.00 | 0.00 | 0.997 |
| Learning/Growth | -40.31 | 14.71 | 0.003 |
Comparing the local dependency model (with item-specific effects on easiness only) to a model that assumes a person-specific growth curve across the trials demonstrates that the dependency approach is preferred by both the LOOIC and pseudo-BMA weights. For clarification, the growth curve/learning model treats each trial as measuring a multidimensional trait that changes over trials (e.g., rather than a unidimensional “memory” trait this models something like a “learning” trait that grows over the three trials).
matrix(c(ModelComparisons$LOOIC$Comparison.7[, 1:2],
ModelComparisons$BMAs$Comparison.7[sapply(row.names(ModelComparisons$LOOIC$Comparison.7), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.7)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "Multidimensional"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency and Multidimensional Models", digits = c(2, 2, 3), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.00 | 0.0 | 0.998 |
| Multidimensional | -45.72 | 14.7 | 0.002 |
Similarly, comparing the local dependency model to a model that assumes that each trial corresponds to its own oblique factor/trait reveals clear preference for the local dependency model. The results of this and the previous model comparisons suggest that the CERAD list learning test is perhaps best thought of as unidimensional with test-specific relationships between items. Part of the advantage of the local dependency effects is that they are dynamic and person-specific. In fact, the local dependency effects are actually more like a differential item functioning term as they are an interaction between the items and person. In other words, the expected item easiness for “Shore” on trial 3 will be different for a person who has previously recalled “Shore” on the other two trials versus someone who has not. The local dependency effects thus encode learning (i.e., values of the local dependency variable increase as a word is learned) and dynamic performance (i.e., the expectation for responses changes after each trial).
matrix(c(ModelComparisons$LOOIC$Comparison.8[, 1:2],
ModelComparisons$BMAs$Comparison.8[sapply(row.names(ModelComparisons$LOOIC$Comparison.8), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.8)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "Serial Position"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency and Serial Position Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.0 | 0.00 | 0.68 |
| Serial Position | -0.8 | 0.55 | 0.32 |
Including a fixed effect for the order of presentation for the words did not result in a clearly superior model to the local dependency model. The LOOICs are not compellingly different between the two models, and in comparing the weights, the local dependency model receives about 2/3rds of the overall weight.
matrix(c(ModelComparisons$LOOIC$Comparison.9[, 1:2],
ModelComparisons$BMAs$Comparison.9[sapply(row.names(ModelComparisons$LOOIC$Comparison.9), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.9)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "Serial Position by Trial"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency and Serial Position (varying by trial) Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.00 | 0.00 | 0.74 |
| Serial Position by Trial | -1.12 | 0.55 | 0.26 |
When the serial position effect is permitted to vary by trial, the LOOIC and pseudo-BMA weights are more clearly in favor of the standard local dependency model specification. In reality, it was not assumed that including varying effects for item position by trial would improve model fit; instead, this model was fit in order to test whether the serial position effect was more obvious for a particular trial and hence may require special modeling. In this case, however, no such support was found. Shown here are the conditional effects plots for item position in both the serial position models:
conditional_effects(TwoPL_srlps, effects = "ItemPos")conditional_effects(TwoPL_t3spe, effects = "ItemPos:Time") As is readily apparent from these plots, the item position demonstrates no clear quadratic curvature characteristic of the serial position effect. Similarly, the plots demonstrate that the position of the word on the list exerts no clear effect on recall probability. Allowing a unique effect for each trial also does not help as the final effect estimates are all nearly indistinguishable from one another.
matrix(c(ModelComparisons$LOOIC$Comparison.10[, 1:2],
ModelComparisons$BMAs$Comparison.10[sapply(row.names(ModelComparisons$LOOIC$Comparison.10), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.10)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "All Item Covariates"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency and Complete Item Covariates Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.00 | 0.00 | 0.94 |
| All Item Covariates | -4.98 | 2.91 | 0.06 |
Including all explanatory item covariates in the model did not produce better model fit than including only the item-specific dependency effects. This is not a surprising result as, while hypothesized to all be relevant, rarely is a model improved by include many predictors. As was demonstrated in the model summaries for the explanatory item covariates model in the previous section, many of the item-level predictors are not credibly different from zero, meaning that model performance could potentially be improved by removing some of these variables and searching for a more parsimonious model. At least one clear interpretation from the summary of the item covariate model was that, like the local dependency effects, these variables impacted only easiness, so moving forward, item effects for discrimination are not evaluated.
matrix(c(ModelComparisons$LOOIC$Comparison.11[, 1:2],
ModelComparisons$BMAs$Comparison.11[sapply(row.names(ModelComparisons$LOOIC$Comparison.11), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.11)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Local Dependency", "All Item Covariates by Trials"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Local Dependency and Item Covariates varying by Trial Models", digits = c(3, 2, 3), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Local Dependency | 0.000 | 0.00 | 0.497 |
| All Item Covariates by Trials | -0.002 | 1.23 | 0.503 |
Somewhat paradoxically, including more complex interactions seemingly improved overall model performance to being equivocal to the simpler local dependency model. Part of this is likely dropping the effects on discrimination. The unique part of explanatory item covariates is that they are essentially partitioning variance of a known effect (in this case item difficulty). As a result, item covariates are not adding explanatory power but instead explaining estimated parameters. In this way, significant overall improvement in the model is unlikely from adding item covariates since it is only adding complexity in most cases. Based on this, the more complex but equally performing item covariate model was taken for the next steps instead of the local dependency model. Regardless, the point of this model fit and comparison was to examine what effects might be relevant to include in a final model. Shown below are the conditional effects plots for the item covariate effects by trial:
plot(conditional_effects(TwoPL_itmcp, effects = c("FreqSTX:Time", "Concrete:Time", "Density:Time", "Diversity:Time", "AoA:Time", "BOI:Time", "Phonemes:Time", "Ambiguous:Time", "NamingZ:Time")), ask = FALSE)Using these plots and the posterior estimates of the coefficients, a new model specification reducing the number of item predictors and allowing some specific trial effects was developed. For the sake of transparency, predictors were considered if there was not a credible possibility that the effect was zero. The model’s posterior summaries described in the previous section can be referenced to see which predictors included zero in their 95% credible intervals. Then, the determination of whether a unique trial effect was present was determined based on the plots. Consider the plots for Age of Acquisition (AoA). These plots make it clear that the estimated effect for AoA is negative in the first and second trials but then positive for the final trial. Based on this visualization, it was determined to collapse the effect of AoA in trials 1 and 2 and then allow a different effect for trial 3.
matrix(c(ModelComparisons$LOOIC$Comparison.12[, 1:2],
ModelComparisons$BMAs$Comparison.12[sapply(row.names(ModelComparisons$LOOIC$Comparison.12), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.12)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("All Item Covariates by Trials", "Reduced Item Covariates"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Complete Item Covariates varying by Trials and Reduced Item Covariate Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| All Item Covariates by Trials | 0.0 | 0.00 | 0.7 |
| Reduced Item Covariates | -1.1 | 1.36 | 0.3 |
The reduced model does not improve from the complicated model. From the perspective of the LOOIC, the models are similar to one another; however, the pseudo-BMA weights indicate that the complete model makes better predictions. To help understand why this might be the case, again the conditional effects plots can be examined:
plot(conditional_effects(TwoPL_itmcr, effects = c("Concrete:Trial23", "BOI:Trial23", "Diversity:Trial23", "FreqHAL:Trial23", "AoA:Trial12")), ask = FALSE)Some needed alterations to the model are clear from these plots. For example, the estimated effects for concreteness and body-object integration are clearly equivalent across all trials, which means that these variables are likely best summarized by a single effect across all trials. Where word frequency from the SUBTLEX database was not a significant predictor of word easiness, it appears that the frequency value from the HAL database does better. As can be confirmed in the previous section, this model finds that semantic diversity and AoA are both not credibly different than zero, so they can be reasonably dropped from the model. Of some note, which can again be confirmed in the model summary in the previous section, AoA does appear to be close to a credible effect with an upper credible interval bound of essentially zero.
matrix(c(ModelComparisons$LOOIC$Comparison.13[, 1:2],
ModelComparisons$BMAs$Comparison.13[sapply(row.names(ModelComparisons$LOOIC$Comparison.13), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.13)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Simplified Item Covariates", "All Item Covariates by Trials"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Complete Item Covariates varying by Trials and Simplified Item Covariate Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Simplified Item Covariates | 0.0 | 0.00 | 0.54 |
| All Item Covariates by Trials | -0.2 | 1.19 | 0.46 |
A simplified model that fit a single effect for concreteness and BOI, dropped AoA and semantic diversity, and used the HAL word frequency was estimated based on the above results. Compared to the model with all the item covariates varying by trials, this simplified model was largely identical with no preference indicated by ether the LOOIC or pseudo-BMA weights. In the interest of parsimony, the simplified model was taken for further comparisons. Recall that AoA had an effect that bordered on being credibly different than zero. Given the simplified model has fewer predictors, another model was fit returning AoA to see whether the removal of other predictors may clarify the effect of AoA.
matrix(c(ModelComparisons$LOOIC$Comparison.14[, 1:2],
ModelComparisons$BMAs$Comparison.14[sapply(row.names(ModelComparisons$LOOIC$Comparison.14), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.14)$names)})]),
nrow = 2, ncol = 3, byrow = FALSE,
dimnames = list(c("Final Model", "Simplified Item Covariates"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
kable(caption = "Comparison of the Final and Simplified Item Covariate Models", digits = c(2, 2, 2), align = 'ccc') %>%
kable_classic(full_width = FALSE, position = "float_right")| ELPD Difference | Standard Error of Difference | Pseudo-BMA Weight | |
|---|---|---|---|
| Final Model | 0.00 | 0.00 | 0.62 |
| Simplified Item Covariates | -0.71 | 1.16 | 0.38 |
Adding back in AoA seems to cause no loss in parsimony per the LOOIC but does contribute to improvement in the predictions of the model. As can be confirmed in the previous section as well, the AoA effect is more clearly non-zero in the final model.
Final Model Details
This final section provides further details on the final item covariate model identified in this study. To begin with, some additional details regarding the model’s performance and fit are provided for readers. The manuscript provides summaries of coefficients as well as item and person fit statistics. In a previous section, this supplemental manuscript provided visual summaries of the coefficients as well. To further enrich the visual understanding of the final data, the following plots describe item and person fit further:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
These two plots show the empirical distribution of the differences in log-likelihood of the predicted and observed data under the models. The first plot is grouped by items and reflects the item fit. In an ideal model, the distribution is normally distributed with a mean of 0. These plots are manageable since there are only 10 items on the CERAD, but the same is not true for person fit, which has a total of 1219 individuals to show in a plot. To address the impracticality of plotting all of these individuals, the most extreme case (individual with worst fit) is shown. While the aim of this study was to examine predictors of item traits, it is still possible to visualize person traits (i.e., memory). The following plot shows the ordered distribution of latent trait estimates and their 95% credible intervals for each participant in the study:
The plots demonstrate that, on the whole, the estimates of a person’s latent trait is fairly wide. This is not entirely surprising given that the only estimate being given to the latent trait is a random intercept for each person. The explanatory item response theory model will be expanded in a future study to include factors that predict latent trait, which may reduce uncertainty in the latent trait measurement. The largest limiter to higher accuracy in the latent trait estimate is the fact that memory is being measured with just 10 items repeated only 3 times, so person predictors are likely not going to be a significant source of error reduction. Note as well that, like the item plots in the manuscript, these latent trait estimates do not include any effect of local dependency since this significantly increases the number of plots needed to characterize every possible combination of learned items over the 3 trials. It is also potentially useful to visualize all the conditional effects from the final model. These plots are shown next:
There are also some general test descriptives that are usually helpful to examine. These include test reliability, the expected score function, and test information. As with the other plots, these estimates are based on exclusion of the local dependency effects. The true values of these plots are thus only approximated by these plots. The following plots are these visualizations in their respective order:
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Within IRT, information and reliability are very closely related. It is clear that these plots both have generally the same shape with peaks in their functions around slightly below average scores. As is clear from both of these scales, the CERAD word list is not ideal for measuring individuals with either very low or very high memory abilities. In dementia/mild cognitive impairment research, this particular trait may be favorable as the goal is often to characterize with greater precision mildly impaired memory with relatively little need for carefully measuring extremes. The expected score function is a useful visual analogue for the relationship between raw score and latent trait estimate. The relationship is clearly monotonic and overall very gradual, which means that the raw scores do span a relatively wide range of latent trait values.